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PREFACE 


At the present time there are very few textbooks on statistical decision 
theory, and there is apparently no textbook that gives a reasonably 
complete discussion of decision theory at an intermediate mathematical 
level. The author hopes that this textbook will help to fill the gap. It 
has as formal mathematical prerequisites only calculus through partial 
differentiation and multiple integration, plus some elementary facts 
about the use of determinants in solving linear equations. 

A few sections of the text are demanding on a student with the amount 
of mathematical maturity usually implied by a year course in calculus. 
The derivation of the Wald sequential decision rule in Chapter 7 is the 
most important example, , The. student will be familiar with all the 
mathematical tools used,.but because of the length of the development, 
the instructor will have to keep the student from being overwhelmed by 
the details. | 

Although the purpose of the text is to teach statistical decision theory, 
a substantial proportion of the text is devoted to a simple and relatively 
brief discussion of probability theory. This makes the text self- 
contained for those students who have not previously studied probability 
theory. Students who have had a course in probability theory should 
find the discussion of this theory a useful review, since the emphasis is on 
topics of probability theory most useful in statistical decision 
theory. 

Some of the topics discussed in the text are unusual for courses in the 
general field of statistics. Two examples are linear programming and 
making a sequence of nonsampling decisions over time. The author 
feels that these topics belong in a course on statistical methods. 

For reasons explained in Chapter 5, the loss is made to depend on the 
decision chosen and on chance variables that will be observed after 
the decision has been chosen, rather than on the decision chosen and the 
distribution of the chance variables. 

M 


vi PREFACE 
Chapter 9, the last chapter, introduces conventional statistical methods 
as special cases of statistical decision theory. The examples discussed in 
Chapter 9 seem a step further from practical problems than the examples 
discussed in earlier chapters. This seems to be an inherent property of 
the problems discussed in conventional statistical theory. | 

In a text at this level, it does not seem desirable to include detailed 
statements about which researchers are responsible for the various 
results described. Statistical decision theory is the creation of the late 
A. Wald; the proof that the Wald sequential rule is a Bayes decision rule 
is due to A. Wald and J. Wolfowitz; and the simplex method is due to 
G. B. Dantzig. 

How long a course should be devoted to the text depends on the 
maturity of the students and on the taste and style of the instructor. The 
author has used forty-five hours of lectures fora brief review of Chapters 
1 to 4 and a thorough discussion of the remaining chapters, including the 
discussion of many numerical examples. This will seem to be a slow 
pace to many instructors, who will be able to cover the whole text fairly 
thoroughly in a forty-five-hour course. 

Chapter 9 serves as an introduction to conventional statistical theory, 
and it is possible to use the text for one semester and then to use a text on 
conventional statistical methods for the second semester of a year course. 

The author is indebted to Professor Sir Ronald A. Fisher, F.R.S., 
Cambridge, and to Messrs. Oliver & Boyd Ltd., Edinburgh, for per- 
mission to reprint the tables of the chi-square and the 7 distribution from 
"Statistical Methods for Research Workers." The author is also 
indebted to Professor G. W. Snedecor and the Iowa State University 
Press, Ames, Iowa, for permission to reprint the table of the F distribution 
from “Statistical Methods." 

For valuable typing assistance, the author is grateful to Miss Ruth 
Ritchie of the University of Virginia, Mrs. David Freeman, formerly of 
Ithaca, N.Y., Mrs. Robert Berlin of Syracuse, N.Y., and to Rhoda Weiss. 


Lionel Weiss. 
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Chapter 1 


ELEMENTARY PROBABILITY THEORY 


1.1. Introduction. Everything in this text is based on the mathematical 
theory of probability; therefore a simple but detailed development of this 
theory will be given. The theory will be developed in two separate ways. 
The first way is an intuitive approach; the second way is an axiomatic 
approach. The axiomatic method is neater and mathematically more 
Satisfactory. However, fruitful applications of probability theory to the 
physical world are based on our intuitive understanding of both proba- 
bility theory and the physical world. Therefore it is essential to develop 
the proper intuitive feeling for probability theory. 


‚1.2. Intuitive Definition of Probability. When we say “the proba- 
bility of getting a head when tossing a perfectly balanced coin is equal to 
14," we mean that when such a coin is tossed a large number of times, a 
head will come up on approximately one-half of the tosses. | 

In general, the statement “the probability of getting the event E in one 
trial is equal to p" means that in a large number of similar trials, the 
Proportion of trials in which E will occur will be close to p. 

Note that in the definition we did not say how large the number of trials 


Should be, nor how close the proportion of occurrences of E will be to p. 
All that we can say about this now is that intuitively we feel that the larger 
f occurrences will be to 


the number of trials, the closer the proportion o 
the probability. | | 

There is another aspect of our definition. Ifa coin when tossed gives 
an exactly alternating sequence of heads and tails, extending indefinitely 
as follows: head, tail, head, tail, head, tail,..., it is true that the pro- 
Portion of times a head appears approaches 14 as the number of tosses 
increases, but there is too much regularity here for us to apply our simple 
Probability theory. In other words, the trials on which a head occurs 
must be scattered irregularly among all the trials. Another way of 
Putting this is to say that we should be able to forecast the long-run 


Proportion of heads only and nothing else. 
1 


N 
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In practice, we usually do not know the numerical value of the proba- 
bility of a given event. Experimentation has shown that the probability 
of getting a head when tossing an ordinary coin is nor exactly } 2. 
Presumably, the different designs on the two sides of the coin throw it 
slightly off perfect balance. By definition, a "true" or "fair" or "well- 
balanced” coin is one with the probability of a head equal to ! », but such 
a coin may not actually exist in the physical world. Similarly, a “fair 
die is one with the probability of each face equal to 16. Such a die may 
not exist in the physical world, but that need not prevent us from develop- 
inga satisfactory theory based on the assumed existence of such a fair die. 


1.3. Use of the Intuitive Definition of Probability to Develop the Basic 
Rules of Probability Theory. The identification. of probabilities as 
approximately equal to proportions of occurrences will enable us to 
develop certain basic rules of probability theory. 

The symbol P(E) is to be read “the probability of the event E." 

Since, for any event, the proportion of times it occurs in any sequence 
of trials is between 0 and 1, it is clear that any probability must be 
contained within the limits 0 and 1. 

If E is any event, we define the event "not E" as that event which 
occurs on each trial where E fails to occur. Thus, if E is the appearance 
of a head on a toss of a coin, then (not E) is the appearance of a tail, 
assuming the coin cannot stand on its edge. Suppose we observe N 
trials and let n(E) denote the number of trials on which E occurs, while 
n(not E) denotes the number of trials on which (not E) occurs. We must 
have n(E) + n(not E) = N. Dividing through by N, we get n(E)/N + 
"not £)/N = 1. But by our definition of probability, n(E)/N is close 10 
P(E), n(not £)/N is close to P(not E). Thus we have as a basic rule © 
probability: for any event E, P(E) ~ P(not E к= |, 

If D, E are any events, we define the event “D and E" as that event 
which Occurs on each trial where both the events D and E occur. TuS 
ا‎ ig reee of a spade when turning up the top card in a x 

appearance of an ace when turning up the top card, 


event (D and E) is thea еагаг ; 
а апсе о а пе 
| рр the ace of s pades when t 


If D, E 
which occ 
Thus, if D is the appearance 
deck while E is the 
appearance of an 


hich the event in parentheses oe, 
(D) + (E) — n(D and E). То see 1” 
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note that n(D) + n(E) would be greater than n(D or E) by exactly 
n(D and E), since those trials on which both D and E occurred would be 
counted, in both n(D) and m(E) The following numerical example 
illustrates this. Suppose each trial consists of shuffling a full deck of 
cards and turning up the top card. The event D is turning up either a 
jack or a queen, while the event E is turning up either a queen or a king. 
Then the event (D or E) is turning up a jack, queen, or king, while the 
event (D and E) is turning up a queen. Suppose 1,000 trials are observed 
and a jack appears on 72 trials, a queen on 76 trials, and a king on 77 
trials. Then n(D) = 72 + 76, n(E) = 76 + 77, n(D and E) = 76, 
n(D or E) = 72 + 76 + 77, illustrating the fact that n(D or Бү = 
n(D) + (E) — п(р and E). Dividing this last equality through by N, 
we get 
n(D or E) _ n(D) , n(E)  n(D and E) 
N N N N 


But in this equality, each ratio is the proportion of times an event occurs, 
and these proportions are close to the corresponding probabilities, by 
our definition of probability. This gives another basic rule of proba- 
bility theory: for any events D, E, P(D or E) — P(D) + P(E)—P(Dand E). 

Whenever the events D, E are such that it is impossible for them both 
to occur on the same trial, they are said to be “mutually exclusive." As 
an example, a trial might consist of turning up the top card of a deck, D 
might be the appearance of a heart as the top card, E might be the 
appearance of a black card as top card. If D, E are mutually exclusive, 
then P(D and Е) = 0, and P(D or E) = P(D) + P(E). 


1.4. Intuitive Definition of Conditional Probability. We have defined 


the probability of an event E as the approximate proportion of trials on 
which E occurs in a long series of trials. Now we define the "conditional 
probability of E given that D occurs" as the approximate proportion of 
trials where E occurs among those trials where D occurs; that is, in 
computing the proportion, we disregard all trials where D does not 
occur. The symbol P(E | D) denotes the conditional probability of E, 
given that D occurs. . 

Suppose we observe N trials and n( ) denotes the number of trials 
where the event in parentheses occurs. The proportion of trials where 
E occurs, among those trials where D occurs, 1S equal to 

n(E and D) 
n( D) 


Which is equal to 


n( DN 
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But this last expression is close to P(E and D)/P(D). Thus we are led to 
define P(E | D) as P(E and D)/P(D). [Here we assume P(D) > 0.] 


1.5. Independent Events. The statement “the event E is independent 
of the event D” is defined to mean that P(E | D) = P(E). The reason 
for the use of the word "independent" here is that the probability of E 
remains the same whether we know that D occurs or not; so knowledge 
about D does not change our knowledge about Ё. 

Our definition of P(E | D) implies that P(E and D) = P(D)P(E | D). 
If E is independent of D, we then have P(E and D) = P(E)P(D). Con- 
versely, if P(E and D) — P(E)P(D), we have 


Р(Е | D) = P(E and D) = P(E)P(D) _ 
P(D) P(D) 


so E is independent of D. Thus we could define E's independence of D 
by the equality P(E and D) = P(E)P( D). 

Similarly, we define “ D is independent of E" to mean P(D | Е) = P(D), 
and we find that this is equivalent to P(E and D) — P(E)P(D). bs. 
means that if D is independent of E, then E is independent of D, an 
vice versa, so it is customary to say simply that “D and E are inde- 
pendent," which is equivalent to any of the equalities P(D | Е) = Р(Р), 
P(E | D) = P(E), P(D and Е) = P(D)P(E). : f 

As an example of independent events, suppose a trial consists 4 
thoroughly shuffling a full deck of cards and turning up the top care: 
D is the appearance of a spade as top card; E is the appearance of a 
Picture card as top card. By the intuitive meaning of “thoroug 
shuffling,” we have P(D) = 1342, P(E) = 1240, and P(D and E) = ?5* 
Since in this case P(D and E) = P(D)P(E), the events D and E are 
independent. ial 

As an example of events which are not independent, suppose à Le 
consists of rolling a fair die once; the event D is the appearance of an Fw 
number of spots, while the event E is the appearance of fewer than ee 
spots. Then P(D) = 36, P(E) = 36, P(D and E) = lé, and ue 
P(D and E) is not equal to P(D)P(E), the events D and £ are not in 
pendent in this case. $ 

When the occurrence of the event D depends on an experiment that ү 
no physical connection with the experiment determining the occurren e 
of the event E, it is always assumed that the events D and E are 1" е 
pendent. Thus, if we toss a penny and a nickel and D is the арреагапо 
of a head on the penny and Fis the appearance of a tail on the nickel, 
and E are independent events. 

1.6. Additional Basic Rul 
E,..., E, the event “Е, 


- PE) 


vents 


es of Probability. Given any ” © vent 


Or E; or +-+- or E,” is defined as that € 
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which occurs on any trial where at least one of the r events occurs. The 
event ^E, and E, and - - - and E,” is defined as that event which occurs on 
any trial where all r events occur. 

The events Е, E», . . . , E, are called “mutually exclusive by pairs” if no 
two of them can occur on the same trial. If N trials are made, and if 
E,,..., E, are mutually exclusive by pairs, we must have n(E, or E» 
or: ++ or E) = п(Е,) + (Ej) 77: + CE, and dividing through by N, 
we get 


n(E, or E, or +," ог Ej) B(ES , MES) „д n(E,) 
N йо ч ON 


Since each ratio is close to the corresponding probability, we get the 
following basic rule: if £i, E», - ++» E, are any events which are mutually 
exclusive by pairs, then P(E, or E, or ++- or E) = P(E) + P(E; + °° 
+ P(E,). 

If > D,,..., D,are any events, they are called “mutually independ- 
ent" if P(D, and D, and >° and D,) = P(D)P(D) ‘° P(D,), where 
а, b,..., nare any integers all different from each other and all between 
l and r. This definition of independence is а generalization of the 
definition of the independence of two events. If each of the events 
Dı, Ds,..., D, is defined by an experiment physically separated from 
the experiments determining the other D's, then Di, Ds... , D, are 
assumed to be mutually independent. 


1.7. Axiomatic Development of Elementary Probability Theory. All 
the rules of probability that we derived above can be derived very 
easily from a simple axiom system. The following is a sketch of this 
approach. 

We start with a given finite number k of “fundamental occurrences," 
Which we represent by the symbols Fi, Fs see The mathematical 
theory says no more to define these fundamental occurrences, but they 
are meant to correspond to all possible mutually exclusive outcomes of 
anexperiment. Thus, if the experiment is rolling a die, k = бапа each 
fundamental occurrence corresponds to one of the six faces of the die. 
Ifthe experiment is picking the top card from a deck of cards, k = 52 and 
each fundamental occurrence corresponds to one of the 52 cards. If 
the experiment is picking the top five cards from a deck ofcards, k — (52) 
(51) (50) (49) (48) and each fundamental occurrence corresponds to one 


of the ways of picking a sequence of five different cards. . 

Attached to each fundamental occurrence is a nonnegative number. 
The number attached to F; will be denoted by p; and is called “the 
probability of Fi.” prt Pett Pr These p's may be 
assigned arbitrarily, provided they are nonnegative and sum to unity. 
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However, p, is of course meant to correspond to the proportion of times 
F; will occur in a long sequence of performances of the experiment, and 
in any application of the theory p, is set accordingly. Thus, if we apply 
the theory to the experiment which consists of rolling a well-balanced die, 
k = 6 and each p, is set equal to 16. In fact, this would be the mathe- 
matical definition of “well-balanced die." : 

Any given set of fundamental occurrences is called an “event. me 
probability ofany event is defined as the sum of the probabilities attache 
to all the fundamental occurrences in the event. (In listing the funda- 
mental occurrences in an event, it is important to list each only once, to 
avoid double counting.) An event is considered to occur on any trial 
where one of the fundamental occurrences in the event occurs, and this is 
consistent with the definition of the probability of the event. 

The set consisting of no fundamental occurrences is also an event (often 
called the "impossible event"), and its probability is defined as zero. 

Given any event C, the event (not C) is defined as the set of all funda- 

mental occurrences which are not in the event C. From this, it follows 
immediately that P(C) + P(not C) =1. 
If D, E are any two events, the event (D or E) is defined as the set of all 
fundamental occurrences appearing in either of the events D or E or in 
both, while the event ( D and E) is defined as the set of all fundamental 
occurrences appearing in both the event D and the event E. From these 
definitions, it is easily seen that the event (D or E) occurs whenever at 
least one ofthe events D, E occurs, while the event (Dand E)occurs when- 
ever both events D, E occur. 

Two events А, B are called "mutually exclusive" if there is no funda- 
mental occurrence which is in both 4 and B. This means that the event 
(4 and B) contains no fundamental occurrences, so P(A and B) = 0. 
Similarly, the events Ai, As, ..., A, are called "mutually exclusive by 
pairs" if there is no fundamental occurrence which is in more than one of 
the events Ay, As ica, Ay. А 

Suppose we represent each fundamental occurrence as a cross in à 
diagram and Tepresent an event by enclosing the set of fundamental 
occurrences comprising the event (Fig. 1.1). The event (D or Ё) consists 


of all the fundamental occurrences in the hatched portion, while the event 
(D and E) consists of all the fund 
hatched portion, at the probability of an event is the 


the fundamental occurrences in the 
| amental occurrence only once), it is easily seen 
from the diagram that P(D or E) — P(D) P(E) - P(D and E). 

r Cor D)as the set of fundamental occurrences 
€ of the events B, С, D ang defining the event 
Set of fundamental occurrences appearing in all 
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three events B, C, D, a diagram similar to Fig. 1.1 can be used to prove 
the following formula: 
P(B ою С or D) = P(B) + P(C) + P(D) — P(B and С) 
— P(B and D) — P(C and D) + P(B and C and D) 


BOE > 
& o Wc x X —X UW X Ow X* * o * 
x ge ж * sc W oc xw BR ow ж ON 
od X ж wo x ee ee X Ga RSF 


Se J 
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Fig. 1.1 


1.8. Examples of the Applications of the Basic Rules of дек 
this section we discuss three examples of probability а 
involving a fairly elaborate use of the basic rules of probability у. 


; in wi ability p of coming up 
Example |. Suppose we have a coin with probability po ep 


ead on each single toss, and we toss the coin m times. | 
= Probability that pes will appear on exactly А of the tosses, where K 
given integer between 0 and n. m А 
еге а trial consists of тт separate tosses. We keep rte ae 
аррепѕ ona trial by listing the result of the first 1055, the sec Ls a си 
the mth toss. Thus there are 2" different possible autoon Sead 
Xperiment, each outcome being described by 2 sequenes 9. 
Pads) and T's (for tails); m symbols in all. Thus the sequ 


т 


In 


"HH HH 


à oft 
з зета the appearance of a head on every one 
“quence 


he m tosses; the 


k т = К 
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represents the appearance of a head on each of the first k tosses, and a tail 
on each of the last m — k tosses. Each of these sequences represents a 
fundamental occurrence. . 

Our next task is to assign a probability to each fundamental occurrence. 
This should be done in a way that takes account of the description of 
the physical circumstances of the problem. Since there is no physical 
connection between the different tosses of the coin, an event defined in 
terms of the first toss is independent of an event defined in terms of the 
Second toss, or the third toss, etc. Then, by the rule of Sec. 1.6, we 
should find the probability of the fundamental occurrence HH -- + HH, 
for example, by noting that this Fundamental occurrence is the occurrence 
of (head on toss 1 and head on toss 2 and · - - and head on toss т), and the 
probability of this is P(head on toss 1)P(head on toss 2) - - - P(head on 
toss m). But P(head on toss i) was specified as p. and therefore the 
probability of the fundamental occurrence HH---: HH is set at p". By 
the same sort of reasoning, any fundamental occurrence with exactly ” 
Н? (and therefore m — r T’s) is assigned the probability p'(1 — p)” 
(recalling that the probability of a tail on any given toss is 1 — p). This 
gives the complete assignment of probabilities to fundamental оссиг- 
rences, 

Now we return to our original problem, finding the probability of 
getting exactly k heads. The event (k heads occur) consists of all 
fundamental occurrences containing exactly k H's, and P(k heads occur) 
Is equal to the sum of the probabilities attached to these fundamenta 
Occurrences. Thereare mi/[k!(m — k)! ]such fundamental occurrences, 


RE m way of choosing k places out of m places, and each such 
uncamental occurrence has been assi d th ability р (1 — р)" 
Thüs-we have 5пеа the probability p^( Ё 


т! 

k! (т — k)! 
Example 2. We choose the top five cards from a well-shuffled deck of 

cards. We want to find the probability that these five cards consist of 

three spades and two clubs. 


P(k heads occur) = ra — py'^* 
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' probability [(52) (51) (50) (49) (48)]7 to each fundamental occurrence, 
to make the sum of the probabilities equal to 1. 
. To fipd the desired probability of getting three spades and two clubs, it 
is merely necessary to multiply the number of fundamental occurrences 
in this event by the common probability assigned to each fundamental 
occurrence. We find the number of fundamental occurrences in the 
event as follows. For each specification of three particular spades and 
two particular clubs, there are 5! different fundamental occurrences 
containing those particular cards, since each permutation of the cards isa 
different fundamental occurrence. But there are 13!/(3! 10!) different 
ways of choosing 3 spades out of the 13 spades and 13!/(2! 11!) of 
choosing 2 clubs out of the 13 clubs. Therefore the number of different 
fundamental occurrences in the event is 


gp 13h _ 131 
3110! 2! 11! 


two boxes, labeled I, II. Box I contains 3 red 


Example 3. There are 
i. = ed cards and 5 black cards. 


cards and 4 black cards; box П contains 7 r à 
The cards in box I are shuffled thoroughly; then а card is drawn from box 
Iand (without being observed) is transferred to box II. Then the cards 
in box II are shuffled thoroughly, and one is drawn. „Тһе problem is to 
find the probability that the card drawn from box IL is red. f 
In order to keep track of the fundamental occurrences 1n this example, 
we imagine that each card is labeled as follows. The 3 red cards in 
box I are labeled Irl, Ir2, Ir3, respectively; the 4 black cards in box I 
are labeled Ibl, Ib2, Ib3, Ib4, respectively; the 7 red cards in box II are 
labeled Url, IIr2, . . . , 117, respectively; the 5 black cards in box II are 
labeled IIb1, IIb2, . . . , IIb5, respectively. Then a fundamental occur- 
rence is described by listing a card whose label starts with I, and then 
listing either the same card or a саг 
Card listed is the card transferred fro 
rs is the card finally drawn from box II. 
ifferent fundamental occurrences. 
. Our next s istoassigna probability to each fundamental occurrence, 


: ical description of the experiment. 
Eripe c eie p (Isl Ir. This means that 


Let us examine the fundamental occurrence h à 
Irl is transferred, and then Irl is drawn. The as Щу i is 
transferred is equal to 4, by the intuitive meaning of thorough shuffling. 
"Again by the dies meaning of thorough shuffling, the conditional 
Probability that Irl is drawn from box II, given that E " WA ee 
Tom box Т, is Мз. From Sec. 1.4, P(D and E) = P(D)P( | D); 


any events D, E. Denoting by D the event (Ir! is transferred from box 1) 
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and by E the event (Irl is drawn from box П), we find that P(Dand E) = 
P(lrl,Irl) = Q2)013) = 191. Thus the natural probability to assign 
to the fundamental occurrence (Irl,Irl) is 101. Exactly the same 
reasoning shows that the natural probability to assign to each of the 91 
fundamental occurrences is 191. 

Next we count the number of different fundamental occurrences in the 
event (card drawn from box П is гей). There are 8 different fu ndamental 
occurrences in this event starting with Irl, 8 starting with Ir2, 8 starting 
with Ir3, 7 starting with Ibl, 7 starting with Ib2, 7 starting with 1b3, T 
Starting with Ib4. Altogether, this is a total of (38) — (4)(7) = 32 
different. fundamental occurrences in the event. Therefore P(card 
drawn from box II is red) = 5244. 

It is interesting to note that this probability can be found by the 
following short intuitive argument. When we draw from box II, there 
are 13 cards in the box, and on the average 734 of these cards are red: 7 red 
cards originally in box II plus 34 of a red card transferred from box I “оп 
the average.” Therefore the proportion of red cards in box II will be 
734/13 “on the average,” and since the shuffling is thorough, the proba- 
bility of drawing a red card from box II is 734/13 = 9241. 

, In many problems, counting the number of fundamental occurrences 
in à given event may be a very difficult task, and various ingenious 
formulas and methods have been developed to aid in the counting. 


However, our interest is in the concepts of probability theory rather than 
in computational techniques. 


© 


Chapter 2 


CHANCE VARIABLES WITH 
A FINITE NUMBER OF 
POSSIBLE VALUES 


2.1. Introduction. The fundamental occurrences of a given experi- 
2 may be any of a great variety of different objects: playing cards, 
cf is E coin, colors of a chip, etc. - However, in most of the remainder 
E textbook we shall be discussing experiments each of whose out- 
М is a number, so that each fundamental occurrence 1S a number. 

uch an experiment is said to define a "chance variable." 

The following intuitive definition of chance variable may be useful in 
developing the proper feeling for this concept: А chance variable is the 
number that will appear when the experiment is performed. Thus a 
Chance variable is to be regarded as a number that has not yet been 
Observed and is still to be chosen by a chance mechanism. A number 
that has already been observed is called an “observation,” not a chance 
variable. 

The terms "random variable," 
are all synonyms for chance variable. 


2.2. Probability Distributions. Since a chance variable is defined as a 


number to be determined by an experiment and is always in the future, all 


We can know about a chance variable is a table listing the possible values 
»babilities of these values. Such 


It can have, along with the respective prc | 
a table is called the “probability distribution” of the chance variable. 
For example, the chance variable defined as the number that will appear 
еп a well-balanced die is rolled has the following probability distribu- 
lon: 

Possible values | 2 3 4 5 6 


“stochastic variable,” and “variate” 


Probabilities 
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As another example, if our experiment is throwing a fair чш, gee 
the chance variable is defined as the number of throws on ч КАЛУ 
will come up, then this chance variable has the following pro 
distribution: 


Possible values 0 1 2 


Probabilities 


(These probabilities are derived from the first example de. E os 

As a matter of notation, capital letters near the end of the г са of 
such as W, X, Y, or Z, shall denote chance variables, Tae eile 
writing general formulas, it will be convenient to use the 


ility distributi a chance 
notation to represent a general probability distribution for a ch 
variable X: 


Possible values Wy dg ee XR 


Probabilities Ру Pa *** р, 


where of course Р, р»... , p, are nonnegative numbers adding sts 
Before concluding this section, we emphasize that the pro it is 
distribution of a chance variable X contains all the чл! x 
possible to have about the chance variable. This is so because ac p 
variable is defined as a value to be determined by an experiment, e 
therefore all we can know about the chance variable are its Lom 
values with their probabilities. From our point of view, a cha 


а 
variable is described by its probability distribution rather than by ‘ 
physical experiment 


- Rolling a well-balanced die is one physical b 
ment, and picking the top card from a well-shuffled deck of six са 
labeled from 1 to 6 is a 
chance variable, the o 


Possible values 1 234 5 g 


Probabilities lj l% қ Ww M қ 3 
ау В rise 
AS we shall see, Statistical problems are those problems which a 


а H ili i i ions 
when we are dealing with chance variables whose probability distribut! 
are not completely known, 


ba- 
2.3. Expected Values. Suppose the chance variable X has the = 
bility distribution 


Possible values 


Probabilities 


&(x)isa given functio 
pected value of 


“чар Р(х). 


Pi Po ves Pr 


: is 
nofx. Then the symbol EON 
&(X)” and is defined as the value pig(™% 


and suppose 
read “the ex 


Pags) +- 
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As a numerical example, if X has the probability distribution 


Possible values -1 1 
1 — 
Probabilities 4 V$ 


then E(X} = И(—1) + МП) =0; EQC ты 
15 + Vé(1) = 0; X?) = 28(— 1) + KL)? = 1; 
EBX + 2} = 4B(—-l) + 2] + BO + 2] = 2. кен 
As another numerical example, if the probability distribution is 


i 2 $ 4 5$ 8 


Possible values 


Fm Р Е 
Probabilities ымщм KK 


7 


+1465 — 2) +06 – 2 = 
E(x — 2} = M1 — 2? + K — 2 + HG — 2F + 
LYS 20 +146 = 27 = 96 


1 

L) 100 + HOD + 409 + HOD + HOD + 699 = n 
E{g(X)} has an extremely important physical interpretation, which we 

in discuss. First we note that whatever the probability distribution of 
е chance variable X is, we can construct à physical experiment which 


defines Y. For example, if we are given the probability distribution 


9 
Mia -2F 
1 


E 


Possible values | —114 2 7 15% 
ni 
Probabilities i X M M 


We can mount a “well-balanced” arrowheaded spinner on а round dial of 


unit circumference and mark the circumference off into arcs of lengths 
14. 2. 7, 1515, respectively. 


?$ 14, 14, 14, These arcs are labeled —1/2. ^ 
he experiment is performed by spinning the spinner, the outcome being 
the number labeling the arc in which the arrowhead comes to rest. The 
Chance variable defined by this experiment has the given probability 


distribution. | фе 
Now, given a chance variable X with the probability distribution 


Possible values | X1. *2 ort Xp 
ы ш | = =” 
ке» pli 


Probabilities рі Pe 

xperiment defining 
cord of the results. 
rformance of the experiment by /;, SO 
constitutes our record of the results 


Some of the values їз, fs, ++ +> ty 


сирр Ose we construct а physical е x кезг са 

плаш N times, keeping а ге 

oa that comes up on the ith pe 

of the set of numbers fy, fs + + + > /N 
the N performances of the experiment. 
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will be equal to х,; some will be equal to x»; etc. We denote by yi 
number of values t, f2,..., ty which equal x,. We have n(x4) + nC Е 
°°° + n(x.) = N. The average of the N quantities 2(5), ... f(t) 
g(t) + 205) + + 8(1х) 
N 


А <i › use 
and by collecting equal terms in the numerator, we can write this averag 
as 


п(ху)а(хџ) + п(хь)е(хь) foe 4 n(xy)g(x;.) 
N 
which equals 
n(xi) пх»), ss ac ШОХА) des 
7 В) + TAE gos) + к= EG 


But n(x,)/N is the proportion of the N trials in which the outcome was = 
and by our intuitive definition of probability, we expect а of 
close to p,, if Nislarge. Therefore. if N is large, we expect pend " 
the N quantities g(7,), g(t), . . . » (ty) to be close to pyg(xy) + pagos "v 
` + Prg(X,), which is Efg(X)!. Thus Elg(X)! is a forecast o Ке 
average of the quantities 8001), g03),.... @(ty). This IDIOT si 
àn expected value as a forecast of an observed average is extrem 
important for statistical theory. 


2.4. Cumulative Distribution Functions. If Y is a chance variable 
with a given probability distribution, then for 
compute P(X < x), РОХ 
variable X, though the form 
distribution of Ж. The func 
distribution function Ios ДЕ” 


As a numerical example, suppose X h 


any given value х, we gan 
x) is a function of x, and not of the ар 
of P(X — x) depends on the probabi n 
tion P(X - x) is called the “cumulati 


as the probability distribution 


Possible values =} 0 


2 
Probabilities Y$ қ 4 
Then it is easy to verify that 
P(X <ху—0 ifx<—1 
P(X = x)= 14 if —1. x0 
Р(Х = x)= = 24 0 < x32 
Р(Х < x) = ST if2zex 


The graph of P(X = x) is given in Fig: 2,1. 
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In general, if the chance variable X has the probability distribution 


Possible values | x, X» °’ Xr 


z Probabilities Pr Po P 


then the function P(X < x) has discontinuities at the points Ху, X», . . . , Ху, 
the height of the jump at x, being equal to p; Between points of 


Р(Х<х) 
1+ ————— 
at 
6 
ЕР +— x 
E 0 1 2 3 
Fig. 2.1 


discontinuity, the function Р(Х < x) has zero slope. | sr bm e 
tion, it is clear that if we know the cumulative distribution unc P, 
P(X < x), we can deduce the probability distribution of X: el че le 
values are the points on the x axis where P(X < x) jumps, e ня 
sponding probabilities аге the heights of the jumps. _ poo = ae chance 
distribution function gives just as much information a ec the А а 
variable as the probability distribution in table forin aa Ke i. 
information it is possible to have about a chance чапа p ma 

lative distribution function is more generally HH SUNT eee 


distribution in table form, as we shall see. M _ 
The following formula is useful: If b, c are any values with VERA pe 
P(b < ¥ < с) = P(X < ¢) — P(X p). To prove this, denote the 
event (b < X < c) by E, the event (X. < b) by Es, and E : à 
by E, The events Ej, E» аге mutually exclusive, ап реа t ais 
exactly the same event as (E, or Ез). Therefore AL. ү 4 ы ia 
P(E) + P(E), or Р(Х < c) = Pb < ¥ < c) + P(X < b), 
proves the formula. 
From now on, we shall OC 1 
function" by the abbreviation cdf. | 
Will often be denoted by F(x) or G(x), etc. 


“cumulative distribution 
1l denote the phrase “cun Н | 
de Asamatter of notation, P(X < x) 
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2.5. Multivariate Chance Variables. In Sec. 2.1 we defined a ae 
Variable as a number that will appear when an experiment is пренет" 
Now we generalize this by considering an experiment with each possa 
outcome being a pair of numbers. For example, the experiment ж 
Consist of choosing a person at random from the city telephone direc » 4 
and measuring his height and weight. Then each possible outcome "sie 
pair of numbers, one giving a height, the other the weight. Oranot : 
experiment could consist of shuffling four cards and choosing the top 12) 
each of the cards being labeled with a pair of numbers as follows: ( —1, di 
(3,0), (1,1), (C2, —6). In such a case, the pair of numbers qe diem 
appear when the experiment is performed is called a pair of chan 
variables. | | — 

Similarly, there are experiments where each possible outcome is as : 
three numbers. In such a case, the set of three numbers that will appea 


numbers. Then the set of r numbers that will appear when the experi- 
ment is performed is called a set of r chance variables. 


2.6. Multivariate Probability 
single chance variable, informati 
of the possible sets of values with their res 
example, the pair of chance vari 
the illustration of Sec, 2.5 is co 

Possible pairs of values (—1,2) (3,0) (1) (—2,—6) 
Probabilities l4 l4 14 14 
Suppose we denote the pair of chance variables by X, Y, with X denoting 
the first number of the pair that will 


Possible 
values 
of 
Y 


N 


the number in a cell bei 
values in the headings 
Probability di 


ir of 
ng the probability that Ж, Y will be the pair o 


<“oint 
for that cell. Such a table is called the join 
stribution of Y, у” 
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For purposes of writing general formulas, it will be convenient to 
represent the general joint probability distribution of X, Y by 


Possible values of X 


Possible ГА 
values ^^ 
of 


where p; = P[X = x, and Y = yj. 

In the same way, we define joint probability distributions for three or 
more chance variables, as a list of possible sets of values with their 
respective probabilities. 

2.7. Expected Values in the Multivariate Case. Suppose the pair of 
chance variables X, Y has a joint probability distribution written in the 
general form of Sec. 2.6, and suppose g(x, y) is a given function. Then 
the expected value of g(X, Y), denoted by E(g(X, Y)}, is defined as the 


k h 
value Y Y р (х,у). Asa numerical example, suppose X, Y have 
АА 


i= =1 m 
the following joint probability distribution: 


¥ 
-1 0 1 

ӯ 2| и Me Y 

4| M Ma Y 


Then, E(X Y) = (00—00) + 000-00 + ADOD + 018000) 


+ CADO) + 29000) = %- ; ; 
The e pi cim of E(g(X, Y)] is ex actly the same Pw ще 
case ofa single chance variable. Given anyjoint probability distr n 
We сап construct a physical experiment defining a pair of chance — 
with the given distribution. For example, this could be done by property 
of numbers, and 


labeling the circumference of a round dial with pairs c 3 
then he a well-balanced spinner mounted on the dial. The experi- 


i ir of numbers that 
ment can be performed N times, (141) denoting HEPAT Кы 
appears on the ith performance of the experiment, Thenif Vis кше 
а 
уегаре gun) бн." + g(x) 
N 


can be expected to be close to E{g(X,¥)}- 
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Similarly, suppose we have a set of three chance variables X, Y, 2, 
with X having the possible values x, . . . > X; Y having the possible values 
Ji -+ ->Yn Z having the possible values БРЕ 8 Let Pix denote 
P(X = x;and Y = у;апа2 = z). Then, if g(x,),z) isa given function, 
the expected value of g(X, Y,Z), denoted by Eíg( X, Y,Z)}, is defined as 


k h m 
У E ECE 
ё=1 j=] 51 


The definition of expected value when we are dealing with more than 
three chance variables follows the same pattern. 

If r(x,y,z) and 5(х,у,г) are given functions and c, d are given constants, 
itfollows directly from the definitions that E(cr( X,Y,Z) + ds(X, Ү,2)} = 
c E{r(X,Y,Z)} + d E{s(X, Y,Z)). 


2.8. Multivariate cdfs. If Y, Y is a pair of chance variables with 
à given joint probability distribution, then for any given values x, Js WE 
can compute P(X < хапа y . У). P(X < xand Y - J) is a function 
of x and y, and is called the “Joint cumulative distribution function for 
X, Y." This joint cdf for X, Y contains the same information as the 
joint probability distribution in table form: The possible pairs of values 
are the points (х,у) where the function P(X . xand Y. y) has a jump, 
and the corresponding probability is the height of the jump. 

Let us denote Р(Х < хапа Y < y) by F(x,y). Ifa, ay, bı, b, are any 
given values with а, < as and b, < by, we shall prove that P(a, < X < ds 


and bı < Y < b,) = Ка) — F(as,b,) — F(ay,bs) + Еа). To prove 
this we define the events Ej, E», Es, E, as follows: 


E, is the event (а, — X — a, and b, Y. Бо). 
E, is the event (X < a and b, < y < ba). 
E; is the event (а, < X < aand Y — bi). 
E, is the event (X = a and Y < p, 


The events £,, E», Es, E, are mutually exclusive by pairs. The event 
(X < aand У = Ь,) is the same event as (E, or Ey or Ey or Ej); the event 
(X < a, and Y < ba) is the same event as (Е, or Ej); the event (X < а 
and Y < bj) is the same event as (E, or Е). Thus we have 


F(ab) = P(X = a, and Y < b,) = P(E, or E, or Ез or E) 

= PED) + PES) + P(E) + P(E,); F(ay,by) = P(X - a, and Y < ba) 

= P(E» or Ej) = P(E,) + Р(Е,); F(ay,b;) = P(X < a, and Y < by) 

= P(E, or E) = PUES) + pg): F(ab) = P(X < a, and Y < b,) = P(E) 
From these relationships, we find 
P(a, «c X = 4» and b, 


< Y« by) = Flay,b,) — F(as,b)) — F(a, bs) + F(ab) 
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2.9. Marginal Probability Distributi 
t y Distributions. Suppose the pair of 
variables X, Y has the joint probability distribuon E TEM 


7 


Pin Pan 


X by itself is а chance variable, with possible values Xy, Xs, +++» Хь 


Also 
h 
P(X = х) = È Pi 
fe 


T is, we get P(X = x; by adding all the probabilities in the column 
Es edx,inthetable. To see this, we note that the event (X = x) is the 
"m te event as (X = x, and Y= y) or (X = х, and Y = yg) or: ^^ or 
У = x,and У = y,)], and the events (X = x,and Y = у), (X = x,and 

= yg, ..., (X = x,and Y = у,)аге mutually exclusive by pairs; thus 


h h 
P(X = х) = > P(X = x; and Y= y) = È Pus 
j=1 j=1 


as stated. 
Similarly, Y by itself is a chance variable, 
Ji р... yu, and 


with possible values 


" 
POY = у) = Pi 
ї=1 


by the same reasoning we used for X. These separate probability 
distributions of X and Y we get by adding probabilities from a joint 
Probability distribution are called “marginal probability distributions.” 


As a numerical example: 


JOINT PROBABILITY DISTRIBUTION 
X 


р m T DISTRIBUTION OF Y 
j PR : 


24 
possible valu | سس‎ 


سے 
ULM‏ 


MARGINA 
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ze g 1 


gives the same marginal probability distributions as the different joint 
probability distribution above. 


Suppose X, Y have the joint cumulative distribution function F(x,y). 
The chance variable ¥ by itself has a cumulative distribution function, 
called the “marginal cumulative distribution function for AX denoted by 
F(x). (The subscript 1 shows that it is the first of the 
variables we are dealing with.) We wa 
enable us to compute Fix) from a knowledge of F(x,y). Since F(x) = 
Р(Х < x), we have 


h 


Р(х) = 2 È Pis = Р(х,ў) 
where ў is any value greater than max Qoca j. We note thatwe 
may write F(x, JJ) as A 
lim F(x,y) 


y- o5 


Which gives us the formula 
F(x) = lim F(x,y) 


Similarly, the marginal cdf for y, FX y) is given by the formula 


Fa(y) = lim F(x,y) 


If we have a set of three chance variables Y, Y, Z, with y having 
possible values A Fors 


i see, Y having possible values уу, <<- Ym Z having 
Possible values л,... » Zw let Pia denote P(X =x, and Y = y, and 
2 =z). The joint marginal distribution of X, Yis given by 


m 


P(X = Xx; and Y = yj => бы, 
а=1 


The marginal distribution of y is given by 
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distributions for Y and for Z, all derived from the joint probability 
distribution of X, Y, Z analogously. 

The jcint cdf for X, Y, Z is the function F(x,y,z) = P(X < x and 
Y < yandZ < 2). The joint marginal cdf for X, Y, denoted by Р, (x,y), 
is given by 

F, (x,y) = lim F(xysz) 


We also have s 
F, a(x,z) = lim Р(х,у.) s. C.E R.T., West Bengal 
y> A» > 
= -——— D a 
Ера) = im FOI) No ROS. 
The marginal cdf for X is 
Еү(х) = lim Е(х,у,2. 
Vs 20 
with corresponding definitions for F(Y), Fs(2)- " 
2.10. Conditional Probability Distributions and Conditional Expected 
Values. Given the pair of chance variables X, Y, the conditional 
probability distribution of X given that Y = у,” is defined as the follow- 
ing probability distribution: 


Possible values x 


Xs Хк 


Probabilities | PX | Y 299 P(X = xa] = yp st = x,| Y = у) 


where of course 
P(X = x, and Y= y) 
еа] РҮ y) Кы 
conditional probability given 1n 
alue of g(X) given that Y = у” is 
} and is defined as the quantity 


in accordance with the definition of 
Chap. 1. The “conditional expected у 
denoted by the symbol E(g(X) | Y =): 


X g) PX -—|Y-»2 
xd NIC Š 
As a numerical example, if we have the joint distribution 


X 
үй 7 
21и: Me MW 
34 
4| м Ma Y 
Е: vam | 


tional distribution of X given that Y-2is 


=j 6 1 


T 2 
Probabilities ж 6 ?6 


= (%\(-1? + (X0? + CIAF = 56. f 


then the condi 
Possible values 


^ 


Í 3 E 
(> M UL, Calcutta © 
\\ “7 ` 
Ш 7 O эм um s xA Ў 


апа E(X?|Y = 2) 
0 


м fey 
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The physical interpretation of Efg(X) | Y = yl is analogous to the 
interpretation of an ordina.y expected value. We construct a physical 
experiment defining a pair of chance variables Y, Y with the given joint 
probability distribution, and we perform the experiment N times. We 
disregard all outcomes except those on which the observed value of Y 
was y,, and we take the average of the observed values of g(. X) on the out- 
comes where the observed value of Y was y, This average is expected to 
be close to E(g(X) | Y = y,}. 

X, Y, Z are three chance variables with a given joint probability 
distribution, the joint conditional distribution of X, Y given that Z — z, 
is given by using the conditional probabilities P(X — x, and Y — 
Ji | Z — z) The conditional probability distribution of .Y given that 
Y — y, and Z — z, is given by using the conditional probabilities 
P(X = x,| Y = y and Z = z,). 

We can generalize the definitions above by using more general condi- 
tions. For example, we could define the conditional distribution of X 
given that b < Y < c by using P(X — x,|b — Y — c). 

Any conditional probability distribution has associated with it a 


conditional cdf. For example, the conditional cdf for X given that 
Y = у, is the function P(X — x | Y 


2.11. Independent Chance Variables. X, Y are called independent if 
P(X =x and Y = у) = P(X = х)Р(ү = y) for all values x, y. In 
terms of the tabled joint probability distribution, this means that the 
probability in any cell is the product of the two marginal probabilities in 
the row and column of that cell. Two examples illustrate this. 


X, Y Nor INDEPENDENT 


97 
Y 76 
M 
ж M 15 
X, Y INDEPENDENT 
X 
=1 0 1] 
1 2% 
Ӯ 0 29 24 
M 


% 
M M 


If X, Y are independent, the conditio 
condition on Y is the same as the margi 


LZ 
2 


nal distribution of X given any 
nal distribution of XY. Also, the 
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conditional distribution of Y given any condition on X is the same as the 
marginal distribution of Y. 
If X, Y are independent and r(x), s( y) are any given functions, then 


h k 

E{r(X)s(Y)} = X Xr(xjs(y)P(X = x; and Y= у) 

i11 
k 

=> >! 


1 


(x)s(y)P(X = x)P(Y = у) 


If X, Y are independent, then the joint cdf F(x,y) is equal to Р(х) (у) 
for all values x, y. To prove this, we have 
F(xy)— Y У Р(Х = хапа Y= у;) = У У P(X = х)Р(Ү= у) 


iiy баа Puy. блуса 


= У P(X =x) У PY =)= F,(x)Fa(») 
Eryr iuju 
Conversely, since the joint cdf contains all the information about the 
joint distribution, if F(x,y) = AF) for all values x, y, then Xand Y 
are independent. 
The above statements can be generalized to more than two chance 
variables. X, У, Zare independent if P(X = xand Y yandZ = 2) 
P(X = x)P(Y = у)Р(2 = z) for all x, y, z. If X, Y, Z аге oe 
E(r(X)s( Y)(Z)} = E(r( X))Elst YQE((Z). X, Y, Zare independent i 
and only if F(x,y,z) = Fi(x)Fa(y)F a2) for all values х, у, 2. dd 
We can also define the independence of two sets of chance vatiáb es. 
f X1,..., Xn Y,... , Y, are chance variables such that P(X = and 
‘vand X, = x, and Y, = y апа: cc and Y, = y) = P(X% = X and 
"""and Xe x» RAA = n and-:-and Y, = yj. for all values 
Kia a s aNg Miye as gg then we say that the set Xj, ..., 2,15 independent 
ef XS eur. бе ` . МЕ 
Whenever chance variables are defined by separated physical experi- 
ments, they are assumed to be independent. For example, if we shuffle 
two separate decks of numbered cards and define X as the number on the 
top of one deck and Y as the number on the top of the other deck, then 


X, Y are independent. 


Chapter 3 


CHANCE VARIABLES WITH 
AN INFINITE NUMBER OF 
POSSIBLE VALUES 


3.1. Cases in Which the Possible Values Can Be Listed in an Infinit 
Sequence. We start with a simple example. Suppose our exper 
is tossing а well-balanced coin until a head appears, and the Mi 
variable X is defined as the total number of throws that must be made 
have a head appear. In this case, the possible values of X are 1,2,3, . 3 
and if r isa positive integer, P(X = r) = P(tail on toss 1 and tail on tos 
and · ғ and tail on toss ғ — 1 and head on toss r) = P(tail on ш 
P(tail on toss 2) - - - P(tail on toss r — 1) P(head on toss r) — Ver, 
probability distribution of X in table form is 


Possible values | 1 2 3 4 


Probabilities Y$ Mj i$ Mg 
oe У z А м + 
We note that the sum of the probabilities is the infinite series 25 + м 


% ++++=1. The cdf F(x) for this case is given as follows: 
F(x) =0 ix <1 
F(x) = M4 if1ex«2 
F(x) = V6 + 4 = 94 if2<x<3 


Fx)=4+M%4+K=% if3<x<4 


+ a paton 
and so forth. In the general case of this sort, the probability distributio 
in table form will be represented by 


Possible values | x, Xs % 


Probabilities Pi Pa Ps 


s 
where the infinite series py + Pa + pg += d. E(g(X))is defined 4 
24 


—— CEPR 4 
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the infinite series pg) + peg(xs) + psg(%s) + °°" > provided that this 
series converges absolutely. If the series does not converge absolutely, 
we say that E(g(X)) does not exist. Thus, for the numerical case given 
at the beginning of this section, E{X} = 141) + CA) + Q9) 4 

+++ = 2; but E{2*} = CA + CAC + COOH =1+1 + 
1 + °°° , and this series does not converge, so E(2* does not exist for the 
present case. When E{g(X)} exists, its physical interpretation is exactly 
the same as in the case of a chance variable with a finite number of possible 
values. When E{g(X)} fails to exist, it means that the average [g(4) + 
+++ + g(t,)]/N (in the notation of er 
Sec. 2.3) cannot be expected to be 
close to any finite number, because 
of the relatively frequent occurrence 
of quantities very large in absolute 
value among the numbers g(ñ), . . 


&(ty). 


3.2. Chance Variables with a Con- 
tinuum of Possible Values. We 
introduce this case with a simple 
example. Suppose we have a 
well-balanced arrowheaded spinner 
mounted on a round dial with cir- 1 
cumference equal to 1. The dial is 
labeled as in Fig. 3.1. We spin the Fig. 3.1 
Spinner and define the chance vari- Е er 
able Y as the number on the dial to which the arrowhead will point (if 
could be read as either 0 or 1, let 


the arrowhead points to the place that 


us agree that it will be read as 1). In any such experiment that could 
t would be possible to read only 


actually be carried out in practice, i 

a finite number of places on the dial, and thus X would have only a 
finite number of possible values. However, we are interested in the 
limiting ideal case where an infinite number of decimal places can be 
read, so that any value between 0 and 1 is a possible outcome of the 
experiment. We study this case by finding the cdf. If we could read 
only two decimal places, the cdf for X would have 100 equally spaced 


jum i . ]f we could read three decimal places, 
jumps, each of height Моо eed jumps, each of height Minn. 


the cdf would have 1,000 equally sp j 
Clearly, as the number of decimal places we can read increases, the 
cdf approaches the function Р(х) defined as follows: F(x) = 0 for 
x <0, F(x) = x for 0 < x < 1, F(x) = 1 for x > 1. This function 
F(x) is the cdf for X in the limiting case where an infinite number of 
decimal places can be read on the dial. The fact that P(X < х) = x for 


Oorl 


ET 
эе 
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0 < x < 1 inthe limiting case is also obvious from the intuitive meaning 
of “well-balanced spinner.” For example, P(X < 14) must be equal to 
14, since the set of points on the dial which are labeled with values no 
greater than М4 cover one-quarter of the total circumference. 


3.3. General Properties of cdf's. In Sec. 3.2 we discussed a cdf that 
was а continuous function. Earlier, we had discussed cdf's that had dis- 
continuities. The following proper- 

ties are possessed by every cdf F(x): 


1. For any values a, b, with 


p a — b, F(a) — F(b). 
p^ 2. lim Р(х) = 0. 
^4 Z-— 0 
"a 3. lim FG) = 1. 
z IH 
P 4.lim [F(x + A) — Ао) =% 
^ AU 
(A>0) 
X Scale S for every value x. 
Fig. 3.2 Conversely, any function that 


possesses these four properties is the 
cdf for some chance variable X. The following type of physical experi- 
ment can be used to define chance variables with a great variety of cdf^s. 
A well-balanced spinner, with an arrowhead at each end, is set spinning. 
Below the spinner is a line extending indefinitely in both directions, and 


Bom 


\ 
| 
| 
| 
| 
| 
| 
] 
L + س‎ 
0 8 
Fig. 33 


on the line is engraved a scale $, which can be chosen arbitrarily. When 
the spinner comes to rest, an imaginary line is drawn through the spinner 
and extended until it meets the scale S (the probability is 0 that the spinner 
will be parallel to 5). The chance variable X is the value on S intersecte 
by the imaginary line through the spinner. Figure 3.2 shows this. 

The cdf for X is determined by the scale S. Some examples follow. 
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us 1. The scale S is an ordinary arithmetic scale, and the 
Suas in p рле is A units vertically above the zero point on the scale 
emotes s ~ . Remembering that the spinner is well balanced, it is 
Ps oye n the definition of the angle 0 in Fig. 3.4 that F(x) = 
© Че эш А ~ 1/7 arc tan х/4. The graph of F(x) in this 
a Example 2. The scale Sis broken into subintervals, and every point in 
озы subinterval is labeled with 
тання number, as in Fig. 3.6. In 
"s ser. if the scale S is broken 
ie = finite number of subintervals 
M wy the subintervals oneach end à 
Phal infinite length), then X has \ 
valde a finite number of possible \ 
i 5, and the cdf F(x) is of the 
ole discussed in Chap. 2: it is 3 
Sun except at the finite Bi 
= er of points at which it has a E 
em ud If the scale S is broken into 
ans ES number of subintervals, 
RM. chance variable X hasan | 
pen Prisc of possible values, whi 
nd e a The cdf in this case 15 
nber of points where it has a jump. 


Fig. 3.4 


ch can be listed in a sequence, as 
horizontal except at the infinite 


F(x) 
1 


Fig. 3.5 


Example 3. This example is а mixture of Examples 1 and 2. The 
ome subintervals have all their 


tm S is broken into subintervals, and sc i 

hn labeled with the same number (as in Example 2), while the other 

the tervals are labeled with arithmetic sca Then the cdf will have 
NI of graph shown in Fig. 3.7. | 

о matter what type of cdf F(x) the chance variable X has, we see from 


les. 
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i i b, we have P(a < X < 
. 2.4 that for any given values a, b, with a — b, 
Uh F(b) — F(a). d: we hold a fixed and let b approach a, we med 
from property 4 of a cdf that P(a — X < b) approaches л ди 
ever, if we fix b and let a approach b, then Р(а < X < Б) арргоа 


FN — —— rs 
ر ماتا‎ 
3 E. -3 15 29 4l 
Fig. 3.6 
F(x) 


->x 


Fig. 3.7 


А ; а 
jump in F(x) at the point b [if F(x) is continuous at the point Ё, it has 
jump of 0]. But it is clear that 


lim P(a < X < b) = P(X = b) 


a+b 
Thus if a cdf is continuous at a point b, Р(Х = Ь) =0. 


3.4. Probability Density Functions. 
tinuous everywhere and has a derivative 
finite number of exceptional points. 
“probability density function,” or “pd 
a matter of notation, the derivatives 
by f(x), g(x), respectively. 


is 
Аз ап example, the pdf f(x) corresponding to the cdf F(x) of Sec. 32 


Suppose the cdf F(x) 15 © 
at all points, except perhaps 
Then the derivative is called : 
f,” for the chance variable X. ted 
of the cdf's F(x), G(x) will be deno 
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f(x) = 1 for 0 <x <1; f(x), = 0 for x <0 or x > 1; f(x) does not 
exist atx = Oor x = 1. ` , 

As another example, the pdf f(x) corresponding to the cdf F(x) of 
Example 1 of бес. 3.3 is 


fü — 


тА1 + (x/AP 
If F(x) is a cdf and f(x) is the corresponding pdf, then 
Гло ax = ғо] Е F(b) — F(a). 
In particular, А А 
hy f(x) dx — lim F(x) — lim F(x) =1-0=1 


Thus any pdf has the following two properties: 


1. [iro dy к=. 


2. f(x) > 0 for each x, since f(x) is the derivative of F(x), a non- 
decreasing function. 

Conversely, any function with these two 
chance variable Y. 

Since 


properties is the pdf for some 


it f(x) dx = F(b) — lim F(x) = F(b) 


if we are given a pdf, we can find the corresponding cdf by integrating. 


If X is a chance variable with pdf f(x) and 


3.5. Expected Values. © 
8(X) isa given function, then E{g(X)} is defined as L ef) dx. 
in the case 


The physical interpretation of E{g(X)} is the same as 
. Where X has only afinite number of possible values. To see this, , 
Suppose Х has only a finite number of possible values, вауу, Xn, « «Xo 
With probabilities pı, Pa, ++» Px Suppose A is a value less than the 
Smallest of the values Xj, Xa, +++» Xe Then 
k 


E(g(X)) = Y g(xàp: 
i=1 


= lim X g(4 + j Ax»)[F(A + (j + DAY — 

42-0 ј=0 
because FIA + (j + 1) Ax] — F(A +j 
€ values x, falls between A + j Ax and 


F(A + j Ах)] 


Ax) = 0, unless one or more of 
A + (j 41) Ax, and if only one 
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of the values х, fallsinthe interval, F[A + (j + 1) Ax] — F(A + j Ax) = 
Pp: Now if we return to the case where X has a pdf f(x), we find 


Jim X «(4 + /АХ[Е(А + (j + DAN) — F(A + j Ax] =| "goo G9 dx 
2-0 ј=0 4 
Letting A approach — co, we get Efg(X)}. 

Asan example of the computation of an expected value, suppose X has 
pdf f(x) defined as follows: f(x) = 1 if 0 < x < 1; f(x) = 0 for other 
values of x. Then 


© 0 1 ( x 
E(X) -[ xf (x) dx -Í xf(x) dx + [ Xf (x) dx + | xf(x) dx 
-5 Б 0 E i 


0 1 o 
-Í x0 dx + [а el x0dx =0 + 15 + 0 = 15 
-o 0 v 
Also f 


E{x?} -Í x(x) dx = 0 + 14 + 0 = 14 

3.6. Expected Values in More Complicated Cases. We now have 
formulas for E{g(X)} in cases where the cdf F(x) increases only in jumps 
and in cases where F(x) is continuous and has a derivative. But Exam- 
ple 3 of Sec. 3.3 showed that some cdf's are mixtures, increasing both 
continuously and in jumps. If F(x) is such a mixed cdf, we have 
F(x) = R(x) + S(x), where R(x) increases only in jumps and s(x) is 4 


continuous function. R(x) and S(x) have all the properties of cdf 5 
except that 


lim R(x) < 1 and lim S(x) < 1 


since ei 
lim [R(x) + S(x)] = 1 


Let xy, xs, . . . , denote the points on the x axis at which R(x) has jump’ 
and let r(x,) denote the height of the jump in R(x) at х. Also kp 
assume that S(x) has a derivative s(x) everywhere, except perhaps at a 
finite number of points. Then E{g(X)} is defined as У (x rd 


b „ECI dx. This has the usual physical interpretation. 


3.7. Transformation of a Chance Variable. Sometimes we start with 
a chance variable X, but find it more convenient to deal with апо} 
chance variable defined as a given function of X. For example, if y i 
the distance from the point of impact of a missile to the target, it migh 
turn out that with respect to the effect on the target, X? is the importa? 
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quantity. Then, instead of the probabilit distribution of 
be interested in the probability ганат of Х°. And e Seg 
_ be interested in the probability distribution of the chance variable 
ed ) [where h(x) is some given function] rather than the probability 
istribution of X itself. We note that A(X) isa chance variable which can 
be defined by the same physical experiment that defines X: the experiment 
can be performed, and X observed, but then replaced by h(X). Thus we 
are merely relabeling our outcome. 
" We should like to develop methods for finding the probability distribu- 
lon of A( X), starting from the known probability distributionof X. Ina 
case where X has only a finite number of possible values, so that the 
probability distribution can be given in table form, it is a very simple 
ШШ to find the probability distribution of A(X) by relabeling the 
ntries in the table. Two examples will make this clear. 
Example 1. If the probability distribution of X is 


-3 0 5 


14 1ё d£ 

A re 73 

then the probability distribution of 2X + 3 is 
-322-3)43 3=20)+3 13 = 2(5) 4-3 


E % % 
rent values of X lead to the 


Example 2. [In this example, two diffe 
t be added together.] If the 


same value of h( X), so the probabilities mus 
Probability distribution of is 


; 4 
15 yw v5 


t І | Е 
hen the probability distribution of X* is 


бї 4 9 

Злы 

A M 

possible values which can 
an be given in table form, 
iscussed. Thus, if the 


E: : Case where X has an infinite number of 
the Rin in a sequence, so that the distribution С 
а is the same as іп the examples just d 

ability distribution of X is 
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then the probability distribution of 2X is 
2 4 6 8 


иим Me 
while the probability distribution of (X? — 1) is 
0 3 S8 B 


V WM M Me 

In many important problems, it is necessary to find the distribution of 
h(X) when X has a continuous cdf F(x) with corresponding pdf f). In 
such a case, the distribution of X cannot be given in the form of a table, 
since the possible values of X fill a continuum, so new techniques will be 
required to find the probability distribution of A(X). We assume that 
h(x) has a derivative at every x. Also, for the time being, we assume that 
h(x) is either a strictly increasing function of x or else a strictly decreasing 
function of x. For convenience, we denote the chance variable h(X) by 
Y, and we denote the cdf for Y by K(y); that is, P(Y < у) = KQ). Also, 
we denote by r( Y) the function that X is of У. For example, if h(X) = 
2X + 3, then Y = 2X + 3, so X = (Y — 3)/2 and r(Y) = (Y — 3)7 
For a given value y, the event (Y < y) is the same event as [/(X) < №, 
which in turn is the same event as [X < r(y)] if A(x) is an increasing 
function, or is the same event as [X » r(yylif h(x) is a decreasing function’ 
From this it follows that if A(x) is increasing, К(у) = Fir); if h(x) 5 
decreasing, K(y) = 1 — F[r(y)). Then, by the rules for differentiation: 
(d/dy)K(y) exists and is equal to f[r(y)]\(| dr(y)/dy |), whether r(y) 5 
increasing or decreasing. Thus the pdf for the chance variable Y = п 
is equal to f[r(y)] | dr(y)/dy |. 

A second method of finding the pdf for Y will now be described. 1f 
КО) is the pdf for Y and g(y) is a given function, then 


E{g(¥)} -f EOK) dy 


But 

8(Y) = g[hOO] апі — E(g[h(X]) =|` eho] foo dx 
Sake integral, we make the change of еец = h(x), the integral 
Thus we have |, ETUR E " 


р, 800)кО) dy -f 8f tox | | dy 
io y 
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for each function g(y). The only way for this to hold is to have 
d y 
А ky) = лн EL 


ау 
which is what we found above. Two examples illustrate the discussion. 
Example 1. X has the pdf f(x) defined as follows Оо) = Oforx < 0; 
f(x) =1 for 0 < x <1; f(x) =0 forx>1. Y=2X—3. Then 
X = (Y + 32 = (Y), so dr(y)dy = ¥; and f[r(y)] = 1 for 0 < (у + 
30 < 1, or —3< p< —1, and Ду] =0 if y < =з ory > —1. 
Thus (y), the pdf for Y, is given as follows: k(y) 20 if y < —3; 


k(y = if -3 <y < L1; k) = Oify > —1. 

Example 2. X has the pdf f(x) = e~* for x > 0,/(х) = Ofor x < 0. 
Y = X5, so X = Y% = (Y), and ar()/ay = (1/39)... JUG2] = 0 for 
Y? <0, and f[r(y)] = e** for y^ > 0. Thus k(y) = 0 for y < 0, k(y) 
= (1/3y*4) ети for y > 0. | | 

Our discussion so far has assumed that Y is а monotonic function of X. 
As an example of a case where this is not so, suppose Y = Х? and we 
want to find k(y), the pdf for Y. Denoting the cdf for Y by K(y), it is 
Clear that K(y) = 0 for y < 0. The event (Y < у) is the same event as 
(X? < y), and if y > 0, this is the same event as (-Vy«X« vy). 
Therefore if y >0, PŒ <y) = Р-У < X < VJ) = F(Vy) — 
F(— VY) = K), and k(y) = d/dy)KQ) = av ДУ» + (1/27) 

1 


ки - 1 Е б е 
X f(—Vy). For example, if f(x) = AEI for all x, then k(y) = 0 


for y <0; k(y) = E nis > 0; k(y) does not exist at y = 0. 
T › 


3.8. Pairs of Chance Variables. 


for y 


We introduced the joint cdf F(x,y) 


m a pair of chance variables in Chap. 2. Now suppose we are dealing 
With a case where the possible pairs of values of (X, Y) fill a continuum in 
two-dimensional space, and F(x,y) is continuous everywhere, and the 
Second derivative 0? F(x,y)/0x ду exists, except perhaps at points lying on 
à finite number of curves in the plane. In such a case, Q2 F(x,y)]0x ду 
'S called “the joint pdf for X, Y" and will be denoted by f(x,y). 

If a,, as, by, b, are any given values with a, < a, and b; < be, then 


is f һГәк(ау) адаы) i 
[reo ax = | (E ay J“ 
: Ў i — F(ab) + F(ay,b1) 


= F(aa,ba) — F(aa,b) 
alto P(a, «X < aa 


Butin Sec, 2.8, we proved that this last expression is equ 
and bı < y < 5, Thus we find 


ba [а 
f f(x,y) dx dy = Р(а < X < а and b, < Y < by) 
by vay 
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From Sec. 2.9, we have that 


P(X < x) = lim F(x,y) 
y- 2 
and this gives 
lim lim F(x,y) = lim Р(Х = x) = 0 
qg-*—0 с 2 – 0 


and 
lim lim F(x,y) = lim P(X < x) =1 


z= y-*o0 тә 0 
Similarly, we find 
lim lim F(x,y) = 0 
From now on, we shall denote lim lim F(x,y) by F(oo,— оо), еіс. We 
have preteen 
li | | f(x,y) dx dy = Е(со,со) — F(co,— со) — F(— оо, оо) 


+ Е(—00,—00) =1 — 0 — 0 +0 = 1 


(x,y) cannot be negative at any point x, y, for if it were, we could find a 

small rectangle (a, < x < ay, b, < У < bə) around that point, through- 
by ра, 

out which f(x,y) is negative. But then T f(x,y) dx dy would be 

. DIN. N А Joy Ja, = . 

negative, which is impossible, since we saw above that this integral gives 

P(a, < X < a;andb, < Y < by) 


Thus we have shown that any joint pdf f(x, y) has the following two 
properties: ` 

1, i i f(xy) dx dy = 1. 

2. f(x,y) > Oat every point x, у. 


Conversely, any function f(x,y) with these two properties is the joint pdf 
for some pair of chance variables Y, y. | 

P(X < xand Y < y) can be written as P(X < NP(Y <p |X < х), 
and we know from above that | 


lim P(X < x)=0 


Therefore 


lim P(X < x and Y < y) = F(—oo,y) 


25-0 


Similarly, 


=0 for any y 


Е(х,— о) = 0 for any x 
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Since 


L fe f(x,y) dx dy = F(x,y) — F(x, —o0) — F(— о, y) + F(— œ, — оо) 
= F(x,y) — 0 — 0 + 0 = F(x,y) 
if we are given f(x,y), we can find F(x,y) by integrating. 


ü If X, Y have the joint pdf f(x,y) and if S is a set of points in the plane, 
hen И 


Р(Х, Үіп 5) -| frown dx dy 


This has already been shown for the case where S isa rectangle. In the 
general case, S consists of nonoverlapping rectangles and parts of 
rectangles and P(X, Y in 5) is the sum of the probabilities assigned to 
these nonoverlapping parts of S. By using the standard argument 
Involving breaking S into finer and finer pieces, we can show P(X, Y 


in S) = {row dx dy. 


S 
If X, Y have joint pdf f(x,y) and if g(x,y) is a given function, then 


E(g(X,Y)) = | j 5 go) dx dy 


: this integral exists. The physical interpretation of E(g(X, а the 
Sual one. As an example of the computation, if /(х„у) = e-€*n for 


*=Oand y > 0, f(x,y) = Oif x < Oory < 0, then 


Е{ХҮ} = | , | * xye"@ dx dy = 1 


“о «0 
3.9. Transformation of Chance Variables. Just as in the case of a 
ha pair of chance variables 


T chance variable, sometimes we start with a p: 1 
К Y but find it desirable to deal with a new pair of chance variables 
clined as given functions of X and Y. Then the problem is to find the 


X" distribution of the new pair of chance variables. In a case where 
Tel Y have only a finite number of possible pairs of values, we can simply 
abel the possible values to find the joint distribution of the new pair of 


Chance vari 
ariables. "m 
here X, Y have a joint pdf f(x,y). 


Sir 


More complicated situation is W | 
Suppose we oan to find the joint distribution of the pair of chance 
ariables W, Z where W = r(X, Y) Z = 5C Y), the functions r(x,y) and 


al derivatives everywhere and also 

a _ : 

e "IDE the property that for any given values w and z, the simultaneous 
uations w = r(x,y)andz = s(x,y) have exactly one solution in x and у. 


Sin 
17) having continuous first parti 


36 STATISTICAL DECISION THEORY 


Then the equations w = r(x,y)and z = s(x,y) can be solved for xand yin 
terms of wand z, to give, say, x = t(w,z) and у = u(w,z). We denote by 
J(w,z) the determinant 


Ot(w,z Or(w,z) 

“aw д2 

Qu(w,z) Qu(w,z) 
ðw д2 


which is a function of wand z. We want to find the joint pdf for W, Z. 
Denote this joint pdf by k(w,z). For a given function g(w,z), we have 


E(g(W,Z)) -f Г g(w,z)k(w,z) dw dz 


But W = r(X, Y) and Z = s(X, Y), so that 


E(g(9:2)) = Esprit, vy, sov, vy = [^ E stress) sonos asa 


In this second integral we make the change of variable w — (х,у), 
2 = 5(х,у), and the integral becomes 


E i g(w,z)f[t(w,z), u(w,2)] IJ(w,2)| dw dz 


which must equal [^ Г g(w,z)k(w,z) dw dz for every function 802): 


The only way for this to happen is to have k(w,z) = fva 
u(w,z)] |J(w,z)|. This tells us what the pdf k(w,z) is. 

Two examples illustrate the discussion. 

Example 1. f(x,y) = (ёт) exp [—(%%)(х? + y*)] for all x,y. W = 
X*Y8, Z = X°Y*, Then we find Y =Z*W-#, y = WAZ, so 
100,2) = 2w~* and u(w,z) = w%*z-%, Then the determinant J(w,z) 15 
-G9z5w (мы M, 
(32)z *w -% —(%)w%*2-% 


Finally, k(w,z) = (wz-*4/107) exp [—(4)(2%w- 6 + zw]. 

Example 2. f(x,y) = e-**"ifx > Оапау > 0; f(x,y) = 0 if either 
x<Oory<0. W=2X4+ Y,Z=X+4+3Y. Then we find X = 
CAW — 007, Y = —()И/ + (27, so t(w,z) = (36) — (4)2 and 
u(w,z) = —(M$)w + (%)z. Then the determinant J(w,z) is 


M =y 


—5 % 


= — (Xw) 


oN 
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Finally, k(w,z) = (4 34 2 
а 5. rand ¥) exp [—(s)w — Cz Q4w + @921#@@» — 
(0) > 0 and — (15) + (26)z > 0, while k(w,2) = s нар ЕШ 7 


Naz: 


9 


Pu Marginal and Conditional pdf's. If X, Y have the joint pdf 
X, y), we ca i E xr Te 
Шу са write the marginal cdf F(x) sf Гле) dy dr, and 
dF(x) | Í uim 
E m _ Гесу) ду 


Th à 
BUS the marginal pdf for X, usually denoted by f(x), is given by 


[^ (х,у) dy. Similarly, /3(у), the marginal pdf for Y, is given by 
f Ei (х,у) dx. 
Bp Gene is a set of points on the y axis with [i dy > 0. 
to conditional cdf for Y given that Yisin R, F(x | Y in R), is equal 


P(X < x and Yin R) _ ÈJ „Ине 


P(X <x] Yin R)= 
| Yin R) P(Y in R) ге 
Ten F 
Б NE 
n 


that Y is in R, denoted by 
e of g(X) given that Y is іп 


dby E{g(*) | Yin R}. 


Which a: 
e е the conditional pdf for X. given 
in R). The conditional expected valu 


Ri E 
'Sdefinedas| — g(x)f(x| Yin R) dx andis denote 


If th ME 

€ set R is taken as the interval (y, у + Ay), we find 
А _ fy) 
lim f(x |y< Y<? + Ay) = 70) 


апа Ay-0 
Ms last expression is called “the conditional pdf for X given that 
du S denoted by ЛОУ 2. Te E(g(X) | Y = 3} is 


- fee» dx 
Ит 
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The chance variables Y, Y were defined to be independent if F(x,y) = 
Р(Х) (у) for all values x, у. If Y, Y have joint pdf f(x,y) апа X, Y are 
independent, we have 
F(x,y) = F\(x)F3(y) 
OF(x.y) 
ду 
д?Е(х.у) 
дх ду 
so f(x,y) = ACY) for all x, y. Conversely, if f(xy) = AOA) for 
all X, у, we have 


Foxy) = | |i J (r,s) dr ds 


= Fi) 


= AOW) 


= | [ AOs) dr ds 


= [р fir) ar| UH. Лб) as] 


= Fi(x)Fa(y) 
so that X, Y are independent if and only if f(x,y) = fi(x)f« y) for all x, Y 
Itis easily verified that if y, Y are independent, then f(x Yin R) = fix) 
Ле | Y = y) = f), Etig(X)|Y in А) = E{g(X)}, with similar statements 
for f y | X in S), etc. * 
As an example of a joint distribution for two chance variables which 
are independent, Suppose f(x,y) = ] when 0 - x«land0O «y^ |, 


while f(x,y) =O if x — O or ~ : à e find 
that if0 2x < 1, rx>lory<0ory=> 1. Then м 


е i А 
Лб) =f Sy) dy = [fe dy = [i dy —1 
is: 0 " “0 И 


while if x < Oor x > p 


A(x) =| | Ody —0 
Similarly, (y) = 1 foro < pe | Thus 
3 SY < Bf) = Ofor у - 

we have f(x,y) = fiGofa y) for all х,у. J 

As an example 
cies iet suppose f(x,y) = 2 when 0 — x and 0 <J - 
ma ab ES IAN ile f(x,y) = Ootherwise. From this, we find that iC У 
x <1, fix) = 0 otherwise. Also, fly) = 201 = 24 
| ) = O otherwise, Thus f(x, y) is not equal to ACY 


< Оогу > 1. 
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3.11. More Than Two Jointly Distributed Chance Variables. If 
A1 X35 ..., X, is a set of n jointly distributed chance variables, their 
Joint cdf F(X, хь, ..., x,) is defined as P(X, < xi and * - - and X, < х,). 
If any of the » quantities Xj, ...,x, is — o, then F(X...» Xn) = 0. 
| %,..., 0) = 1. The marginal cdf for А, Xə, Xs, for example, is 
Even by F(Xy, хә, X3, ©, ..., oo) and is denoted by Fy.23(%1,%2,%3). The 
Marginal cdf for X,, denoted by F;(x;), is given by (o, 2,..., о, 
э» 99, aa иар), Xy, Xs... , X, are independent if F(Xy, х, ++ +> xe 
Pis) Fs) +++ F,(x,) for all values of xy, Xo, -- -s Xn 

If Fx, X2, ..., X,) is continuous everywhere, and 


ONE (Sis Хш + e <u) 
Ox, Ox. *** OX, 


а everywhere, with the possible exception of a finite number of 
Curves in n-dimensional space, this derivative is called the joint pdf for 
1 * ++, X, and is denoted by f(x, ++ +> Xn). If R is a set of points in 


n-di = 
1 dimensional space, then 
[hes ЧҮТ х) dx, "й, 


PIN „ааз Х,) in А] = f. s 
Jae 


E( "ao Es А i А " dx 
МОК... X,,)} is defined as | e [ go ++, MO s x,) dx 
ах ھل‎ J-a - 
r бу If new chance variables Y... Yn к ج‎ ы с 
Аа... X,), where the functions (Xr +. s> Vises cs олер ы 
à „Жы, | ) 
Ave continuous first partial derivatives and allow a mE a 
int o Xn in terms of Y,..., Yn as Xi rt, ME T 
nt pdf for E ар KE 
fitis орд. зз O + esl 
"ча Jn... y,) is the л by n determinant with the quantity 


in the ith 
Tow and /th column. : inal 
If fi. J the joint margina 
CN ‚ X;) is the joint pdf for №, - - - X,, then the] 


d 
Pdf for А, Xa Ху, for example, is equal to 


i l 9 qc dx 
| ш И. С Л Мы хь) Яхі a 
SS |2 


апа; р n 
" ы denoted by f, 2 3G; Xa 0) The marginal pdf for № 8 


X 


Ai) -[ d [res зарла dS Um ах, 
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Ху, Xa, - - - , X, are independent if and only ДОРЕ), = 
Роа): ++ f(x) for all values of xi, Xa, ..., Xn If х. Xa к» ^ еч 
independent and g(x), .. .  g,(x,) are given functions, it is easily 5 

that 


E{ gy X)gy (X2) dian £O) = E(g( X)) E{ g(X2)} lee Е! g, G0) 


If f(x, . . . , Xn) is the joint pdf for X;,..., Xn, then the ems 
ditional pdf for Y, Xo, Хз, given that X, = x, and: +: and X, — X» 
is defined as 

Жубу esp РСР) Z 
and is denoted by f; o.(1,%2%3 | Xa = Xa- + < , = Xn). ШАХ: a3 Es p. 
independent, then f, ».3(%1,X2,X3 | Xa = X ..., Xn = Xn) = fiae 
for all values of x;, . . . , x,. 

We often encounter the following problem. X, ..., X, apad 
given joint pdf f(x,,...,x,). Y,..., Ym are m chance es is 
defined as given functions of X,,..., X,, wherem <n. The pro c the 
to find the pdf for Yi, ..., Y,. If m were equal to n, we could us ij 
methods already given to solve this problem. Suppose Y, = "(Х» di 
Xa) fori = 1,...,m. To solve the problem, we introduce n — ee 
veniently chosen extra chance variables Z}, ..., Z,—-m by the rela ions 
Zi; —5s(Xy,..., X,) for i—-1,...,n —m, where the ees 
5,3, ..., Xn) are some conveniently chosen ones. Then, by met 
already described, we can find the joint pdf for У,,..., Ум, 2, ә df for 
зау, &(у..., Yms Zi +--> 20-а). Finally, the required joint P 
Y,..., Y, is given by 


sentit 


о © 
f ш [ Bas - «s asas 5s 2 эб.) Йу + == dem 


As an example, suppose X,..., X, are independent, each navini 
the marginal pdf f(x) = e~ for x > 0, f(x) = 0 for x < 0. We Xn 
tind the pat fur Y = Xi чы» «= +}. The joint pdf for Xn =з oe 
is e+" +20) if x, > Oand -- -and x, > 0 and is equal to zero ot 
wise. We define the chance variables Zix sau up Бу A e for 
DOLAR Ж ЖИИ ZS Ine X,, Then the joint pdt 
b -+ +> Zn- Y is found to be e~ for 0 < UR «© Sat 015 
and zero otherwise. Then the pdf for Y is zero if y < 0, and for y á 


given by 
Y (221 у К n1 
F^ y 
i e" dz, dz,°++dz,_. dz,_, = 2——-— 
li [ [ Я 21425 n-2 423-1 n 1)! 


ә 


Chapter 4 


SOME IMPORTANT EXPECTED VALUES 
AND DISTRIBUTIONS 


vari (ents and the Moment Generating F unction. If X isa chance 
“the Hh and r and c are two given constants, then E(X — 9) is called 
Eu e of Y about c," if this expected value exists. The rth 

rst ы of X about zero is usually just called “the rth moment. The 
TOPE of X, E{X}, is also called “the mean of X," and the second 
Xis nt of X about E(X] is called “the variance of X." The variance of 


E 2 
(X — Е{хүр} = үх — 2E{X}X + [Е{Х}}) 
Th = E(x?) — 2[Е{Х} + [Е{Х}} = EGO] — [Е{Х}] 
* positive square root of the variance of X is called “the standard 
€Viation of Y." 
0 Xis a chance variable and tis a given value, then E{e'*}, as а function 
е is called “the moment generating function for X” and will usually be 
he Oted by М (г). It is important to realize that Mx(/) is à function of 
of үү nary variable t, and not of the chance ау X, though the form 
unction is determined by the distribution of 4. 
I UNO ermined by 7 " 2 
тш }exists forevery positiveintegers, then E(X jad M x(O/dt ]i-o- 
e is is the reason for the name “moment generating Suan. ) Pus 
Prove this when X has only a finite number of possible values, with 


TObabi}; s 
р Obability distribution given in general terms as in Sec. 2.2: 


Possible values | X. *2 sse XR 
Tine ce C E 
mx T 


Probabilities pP 


(In th; 
A this case, E(X*) exists for all positive values of s.) We have 


Е 
Mx(t) с E{e'*} = pe 
41 
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and it is easily seen that 


E 
dM х(0)/а° = Y p;x fe 
=} 
so that 


Е 5 
ФМ х(0/4],-, = У рх = E(X*) 
i=] 


А ible 
which completes the proof when X has only a finite number Hers e 
values. When X has an infinite number of possible values which c? E 
listed, as in Sec. 3.1, the demonstration goes through the same way, 


загіне : justified by 
the differentiation of the infinite series term by term can be justifi 


a d 
the assumed existence of the moments. In the case where X has a P 
f(x), then 


M x(t) = E(e'^) -f fxe" dx 
and i 


aM x(0Jdi -Í Xf (e? dx 
so that 


Ф%М хауа], -Í 


sT (x) ах = E(X*) 


(Differentiating under the inte 
existence of the moments.) 


6, : . ts by 
Although it is Sometimes very convenient to compute momen 
diff 


ugh : à : и oft 
erentiating moment generating functions, a more important Rot à 

moment generating function is that the distribution of X can T 

discovered by recognition of the moment generating function for А. 


; > rems, 
shall discuss many such cases and need the following two theo 
which we state without proof: 


Р се 

Тһеогет 1. If M(t) is a moment generating function for some d is 
variable and M(t) is finite for all values t in some interval ( —h,h), wee orre- 
some positive value, then there is exactl ly one probability distribution © 
sponding to M(t). 

Theorem 2. Suppose Mi(t), M(t), . . 
generating functions, with с 
assumed finite for all t in s F(x) 
Suppose M (t) is a moment generating function with corresponding 
and suppose 


lim M(t) = M(t) 


ed 
gral sign can be justified by the assum 


for each t in the interval (—h,h). Then 


lim F(x) = F(x) 
at each x at which F(x) is continuous, 
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4.2. i А "TAS 
M Distribution. If a chance variable X has the 
POE ыр = mille! 2, +» «ий where n is a given positive integer, and if 
and n, where p У. е C — х) pl — р)" for each integer x between 0 
distribution i alue between O and 1, then Xis said to have a binomial 
int dtes ШИ parameters пр. From the discussion in Sec. 1.8, it can 
come E has sae independent trials are made, on each of which the out- 
ilis omits d erae p of occurring, and if the chance variable X is 
distribution ol trials on which E will occur, then Y has a binomial 
If Y has a п parameters л, р. 
Whew пеп inomial distribution with par: 
g function for Y, M x(t) is equal to 


ameters л, p, then the moment 


$ п! п 
ET - т п-т„!х — n! E n- 
Sox! (n — pi Pll py-*e* ee 8 


= (pe + 1 — P)" 


dM <(0/dfı=o = "пр, 


Dif 
erentiati " 
ating this last expression, We find E{X} = 
hat the variance о 


апа Ef үзү — 75 

X is d in dM xod], = пп — Dp? + MPs sot 
alto n(n — I)p? + np — (np? — пр(1 =p). 

able Y has possible 

nonnegative 

ve a Poisson 


4.3. Т А 
Values We ана Distribution. If a chance vari 
Mteger Pap. and P(X = х) = jte~*[x! for each 
е *, 4 being a positive value, then X is said to ha 


IStributi $ 
bution with parameter 2. 


е А-А o (Ael Y. " x 
Му) = X CM ”* Qe = ¢ ^ exp (2e!) 
azo x! dep ж! 


4. 22, and the variance of X is equal 


fro 

m wh; 

tog hich E(Y! = 2, Е[Х?} = 2 
n, p. and if n is large 


If y h 

and p is zs a binomial distribution with parameter 1 is la 
ion ааны then the distribution of X is “almost” a Poisson distribu- 
istribut; parameter equal to np. To see this, suppose X hasa binomial 
Enote ers with parameters 7, p, assume л i$ large and p 15 small, and 

IP by m. Choose a positive integer х. Then 

P 
(Key گے‎ o ui xg) 
x!(n — x)! 
_ о н (т ES (f(a _ - 
n n 


RY 
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: (q — т/п) 
de b me TT epe EP 
ribi Dor probability for a Poisson distribution WI 
и actly as 

Ped the preceding paragraph can be pueda eres 
follows: if n increases and p decreases with np E me"[xl. And 
value) then (n!/[x! (n — x» pr = р)" i edm Poisson distribu- 
the fact that the binomial distribution approaches from Theorem 2 0 
tion as n increases with np — m (m fixed) also ipee binomial distrib 
Sec. 4.1. For the moment a а у ps last expression 
tion is (1 + pet — р)" = [1 + (т/п) (e* — 1)", an + — 1)] being the 
approaches exp [m(e! — 1)] as п increases, exp l with param- 
moment generating function for the Poisson distribu Е 

тт. ounter 
E variables with Poisson distributions are often Ca of persons 
in practice. For example, it has been found that the ep ee perio has 
who will contract a rare noncontagious disease in a given ti explained as 
à Poisson distribution, at least approximately. This Is m the perso? 
follows. Each person is to be considered a "trial," an tcome E occus 
contracts the disease in the given time period, we say the pam (р, ѕау) 0 
on the trial. Assume each person has the same mt is non 
contracting the disease in the time period. Since the “indepen ie 
contagious, the trials represented by different people аге ation and ^. 
If п denotes the total number of people under considera in the give 
denotes the number of people who will contract the disease meters 1 
time period, then Y has a binomial distribution with para 


iso 
à ч ae А as a PO 

But л is large and p is small (it is a rare disease), and so Xh 
distribution, at least approximately. 


4.4. The Hypergeometric Distribution. 
taining R red chips and B black c 


u d 
and take out n Chips, where n < А p. Let X denote the fin 
Ted chips that wi 


to 
an € 
Il be found among the n chips chosen. lem 0 
the Probability EER Clearly, the possible value" ger 
t € integers bet 


Suppose we have 
hips, and we mix the chips 


à in 
т Ween тах (0, т — В)апа тіп (Ал). If xis any 
this Tange, then 
R! B! 
! 
P(X = x) = SUR — x)t(n — x) (B — n + x)! 
(R + B)! 


ee ae _. 
n!(R + B — п)! the 


from all ef 
5 imagine that each chip is distinguished ^ um 
Onvenient way (by labeling each with its 


To see this, let y 
Others in some с 
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perha : 

ыу | particular set of л chips has the same probability of 

thorough ees шу other particular set of л chips, since we assumed 

chips is the ng. Therefore the probability of getting exactly x red 
proportion of sets of л chips containing exactly x red chips. 


The t : 
"» a Lowe of different sets of n chips is (А + B)!/[n! (R+ 
1], while the number of sets of n chips containing exactly x red 
f R multiplied by 


Chips i 
no number of ways of picking x red chips ошо 
r of ways of picking n — x black chips out of В, the product 


being equal to 
R! B! 


NEM — ee 
Ifach x! (R — x)! (n — x)! (B— n + 3! 
ў ance variable X has the probability distribution just described, it 


1 said to 
have a hypergeometric distribution with parameters л, R, B. 
compared with R + Bis 


A 
Nes eena distribution with n small 
this fotos inomial distribution with parameters л, ЕЈ(К + B). To see 
ince 5 Vel think of drawing the п chips from the box one by one. 
Hou dede compared with the total number R + B of chips, the 
the prob hey box will not change very much, and at each drawing 
ability of getting a red chip will be approximately R[(R + B). 


Ut th 
еп the number of red chips that will be drawn has almost the same 
гїп n tosses of a coin 


distributi 

= ee ie as the number of heads that will appea 

distribu ability of a head on each toss equal to R/(R + B). This latter 
ion is binomial with parameters /, Ri(R + В). 

If a chance variable X hi 


e 


4.5 А 
f(x) - The Uniform Distribution. as a pdf 
Biven as follows: 


1 
x) eL forA<x< B 
Jf) == 
then y: f(x) = 0 forx <401х2> В 
d is said to have a uniform distribution between А and В. 
о ee is вае variable with a continuous cdf G(y) and we 
ы the chance variable Z as G(Y)- Then Z has a uniform distribution 
een Oand 1. To see this, first we note that Z oaa take values between 
n 0 and 1, and define y(z) as 


=z, but G[yG) + A] 


a 
nd 1 and no others. Fix a value 2 betwee 
сро) ; 
(Z < г) is the same 


the 
8 largest value у for which GQ) = 


Zforeve wes t 
ev Ty positive value of A. Then the even 
fore, = [G(Y) < z], which in turn is the same eventas[Y« yE). There- 
worre r = Gp n Тин» a 
Blven as follows: 
ч K(z) = 0 forz < 0 
pjan #006887 
К(2) = 1 for Z > 1 
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Differentiating, we find that k(z), the pdf for Z, is 

k(z) 1 for0<z<1 

k(z) = 0 forz < Оог=2 > 1 
that is, Z has а uniform distribution between 0 апа 1. 


4 df 
4.6. The Normal Distribution. If a chance variable X has the P 
f(x) given as follows: 1 a =») 
ЈО) =e We 
су ^T : with 
where о is positive, then X is said to have a normal ашшш ns 21, 
parameters и, с. If X has a normal distribution with и = 0 an 
X is said to have a standard normal distribution. 


If X has a normal distribution with parameters и, c, then 


ж —z-uy +e 
M ےک‎ ok E ) is dx 
x 75 
ON ET = 0 


To evaluate this integr 


01% م 
al, we make the change of variable y = (X i‏ 


getting = 
М x(t) = еч+09 1 (7 -towe dy 
à Jin 4 - f 
. . . 1 nge 
Evaluating the integral in the last expression by making the n 3 
variable z = y — at, we find М x(t) = e+“, Differentiati! 2 
? max) = is c. 
find that E(X) =u, E(X?) = о? + u”, so that the variance of A and 
If X has a normal distribution with mean и and variance 0° an has 
B are constants, then the chance variable Y defined as AX | show 
normal distribution with mean Au + B and variance 420°. 


(Y Y AUN 
E(e'* 1 = Е{е'4Х FAY = Ef{etBettXy 


ll 


{Bes AUN 
e Ее у ову (дп) 
е Bo u1t+ (Yé) ttg? 


Il 


= gout Bye cst n FOL: 
and this last expression is seen to be the moment generating func and 
à chance variable with a normal distribution with mean 44 

variance 4252. 

If X has a 
from Table | 
tion with mea 
given by 


normal distribution, t 
in the Appendix, 
n u and variance о? 


fo 
е 
he value of Р(Х < х) may Pitt 
Thus, suppose X has a norma whic 
, and we want to find P(X < ©) 


un 
bu 
15 


1 н (=P 
o A J.,* "2 a 
y -z 
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To ey 
aluate this i 
IS 
and find that integral, we make the change of variable y — (x — u)le 


® 
Ае j= en =m 
€ e" dy 
But Table | in the A i E zj[ 
"n ppendix gives values of a/v) |. e dy, for 
Pss alues of z i ay 
10, w z. As a numerical le, if u = 2, 0 = 

fi ‚ We ent : al example, if u — 5 9 = a and 

nd PLY — c) E in the Appendix with z — (с — ue = 1.6, and 


4.7. Th 
a e Ce sar 
bie that TT Theorem. It has long been customary (0 
ry L istributions күр variables encountered in nature have proba- 
9 explain the b: иеп are approximately normal. In this section we 
ti irst we devel asis for this assumption. 
119 following = D simple facts about тоте 
ma 1, emmas. 
ot A)» and ra F chance variable Z has mom 
chance va nd Bare constants, then the mom 
Les See e AZ + Bis equal to e" M Á(AD- 
E: emma 2. св paragraph of Sec. 4.6. 
ence generati D Zaye os Z,, are independent chance variables, with 
n the moen ing functions M(t), MA, > M(t), respectively, 
nt generating function for the chance variable Z, + 2 + 


Ж. 

» n 15 
Proof. M,(t)M,(t) M. | Р 
ө} = E{e'™} 


nt generating functions, 


ent generating function 
ent generating function 


2+2 
If Es =. Zitt + ш Elec”: TT 
< Бле)... ў 
M (IB chance vari Eje.) — МОМО 7 МО) ей 
Posi ) is finite fo ariable Z has moment generating function MEO and 
r all values of г in some interval (hh), where / is а 
a Maclaurin expansion, 


givi ve quanti 
ng ity, then M z(t) can be expanded in 


M, 
200) = M 4(0) 4 Ec 2 pee 
z(0) + гам g(DJdt];- amd M (dat l-0 
А / Ъ 0 2! 


=1 + (EL ү, Ê prati: P. 

уте Ай {Z} + 5 EIZ r 5 Aw 

| Roce ime oe of t which approaches zero as t approaches zero. 

nde T the re E{Z} = 0, then the variance of Z is equal to E(Z^* 

tha, Pendent chance ы, of this section, we 255 me that 2» 4» '* are 

We here are ance variables, with variances denoted Бу G1", 729" „and 

two finite positive numbers A, B, with 4 < аё = ВТ аш.‏ و 
ж denoted‏ 


by a 
Ssu 
ang 0, be that the moment generating function for Ai „ 
ау be written as 1 + (EZ + (2/12) ЕЛ} + ep) A (Ò, 
| ches zero as f approaches 


Te exi 
St > 
s a function A(t) which approa 
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zero, such that |A,(‘)| < AC) for all values of ¢ and all i. With these 
assumptions we have the following theorem, known as the “central limit 
theorem.” 


Theorem. If E{Z,} = 0 for all i, then as n increases, 

‚( Z+ Z+: tZ. 
Vo + о ++ 
for each value of y. Ы 
Proof. For simplicity, we give the proof for the case where or = 
ot = ۰۰۰ = 0% say, and А1) = А0) = ۰۰۰ = AQ). The proof in the 
more general case is practically the same, but the typography becomes 

complicated. Denote the moment generating function for 
(Z, + Za + + Zn) 
Jno? 


by M,„(t). Using Lemmas 1 and 2, we can write 
80 = M) (I) - м (+) 
() 1 б/п © on "а/п 


2 2 2 n 
pts Pl) 
2nco* 2по? Vo n 


and as n increases, M, (r) approaches e"? foreach t, Bute is the mo- 


mentgeneratingfunction forastandard normal distribution, and Theorem 
2 of Sec. 4.1 tells us that the cdf for 


1 |" ,-cu9 dx 
<y approaches == e * 
îî 


(AZ +e Za) 
Voa + oo” peee On? 

approaches the cdf for the standard normal distribution, which completes 
the proof of the central limit theorem. ; 

Actually, the central limit theorem holds under weaker restrictions 
than we imposed, but its proof becomes more complicated. However, 
one restriction that is easily removed is that E{Z;} = 0, since the sam 
proof as above serves to show that the cdf of the chance variable 


Zi 4*+* + Zn — E{Z,} — +++ — EG 
Joco 
approaches the standard normal cdf as п increases (noting that the 
chance variable Z, — E{Z,} has its mean equal to zero). at 
One important application of the central limit theorem is to SHOW X 
under certain conditions, a binomial cdf is close to a normal cdf. І е 
has a binomial distribution with parameters п, р, then X has the sa 
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2,7. as the chance variabl 
3495 3 situs d i dini xa 
, are independent chance variables, each ^s 
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owing probability distribution: 
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Probabilities 1-р Р 
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ed above г Z ep ires Z,,andZ ,hasthe probability distribution 
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it theorem states that the cdf for the chance variable 


(Zi 
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ert de. 
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1 limit theorem, 
mal distribution 
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eeause it i Literie ees often ass 

Ponents ^ that they are “built up" as the sum of many "In 
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Tal limi 
t à Е 
theorem are satisfied is usually difficult, and a tende 


Cent 
distributed should be 


Ssume 
: th 
a oe variables are normally 
48. т 
io, Th r 
‘On, we м. i-square Distribution. Before discussing the distribu- 
umber, р uce the following standard notation. If kis any positive 
pat if м (A) is defined as "ye? dx. Integrating by parts, We find 
pe ie, Tk) = (k “pre —1). Since it is easily verified that 
€ note t e have that T(k) = (k — D! whenever k is ® positive integer. 
If the кш T4) = Vs 
ance variable X has the pdf / (9 given as follows: 


ДО) = BoE sete forx > 0 
T(k/2) 
ti Cre ki f(x) = 0 
ion. Ba PUR 
ois е ы integer, then X is said 
meter К, or more commonly; 


for x € 0 
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to have 4 chi-squar 
square 


Xis said to have à chi- 
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distribution with k degrees of freedom. Thus the term “degrees of 
freedom” refers to the parameter of a chi-square distribution. 
If X has a chi-square distribution with k degrees of freedom; 


gr fe 
М.Х = Ее} = Í е'®х®Ї®-1ө-—(ї®) dx 
x)= Ee =F I, 
о) [o 
— ~ | xë! eT (2/2). 20) dx 
T(k/2) Јо 
This integral does not exist unless 1 — 2r is positive, so for the remainder 
of this section we assume that 1 — 27 > 0, or t < М. Then, making 
the change of variable у = (x/2)(1 — 27) in the integral, we find 
ikg 1 EE رو‎ = (k/2) 
M x(t) = (1 — 2t w | ыды ти 2 —-(1—20* 
i meme ° I" 
so that Мх() = (1 — 20% if t < 14, and Му() does not exist if 


t2 7. 


Suppose Xy, X2,..., X, are independent chance variables, each witha 


chi-square distribution with Kı, Ks,..., k, degrees of freedom, T€ 
spectively. Then the chance variable Z = X, + Xs tc^ + X, has û 
chi-square distribution with Кү + ks +--+ +k, degrees of freedom. 


This is shown easily by finding M;(r). By Lemma 2 of Sec. 4.7, 


M(t) = My (1)My(t) +++ My (t) = (1 — 210) (1 — 20787 ET. 
901-207" 
= (1 — 20) rt hat t ka) 


if £ < 34. But this is the moment generating function for а chance 
variable with a chi-square distribution with Кү + kg + ::* + Kn degrees 
of freedom, so that by Theorem 1 of Sec. 4.1, Z has a chi-square distribu- 
tion with k, + Ку ++- +k, degrees of freedom. aha 
Suppose Yj, Ya, ..., Y, are independent chance variables, each mn 
standard normal distribution. Then the chance variable W — Y, 
+ Y, has a chi-square distribution with n degrees of freedom. t 
show this, we note that by the preceding paragraph it suffices to show the 


the distribution of Y is а chi-square distribution with | degree of [е^ 


dom. The pdf for Y, is (1/ V/2z)e-*9*^, and therefore by the discussio" 
in Sec. 3.8, the pdf for А = Y? is 


=й 
"Ca ы 
“ө-т? wa 8-1 ,-r|2 


у оо = 


1 
pP 


=0 forr <0 


Therefore R has a chi-square distribution with 1 degree of freedom: 


and 
i-square Si 
W has a chi-square distribution with n degrees of freedom. Е{Ү:) 


=?) 
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i-square distribution in апу simple form. 


52 STATISTICAL DECISION THEORY 


be shown without difficulty that if W has a noncentral chi-square distribu- 
tion with parameters п and т, and if w is any finite positive number, 
P(W < w) decreases as m increases. To show this, we cap assume 
т = Vm, m, —::: =m, = 0, and note that P(X;? < x) decreases as 
m, increases, for any positive value x. 


4.10. The г Distribution. Suppose X, Y are independent. chance 
variables, X havinga standard normaldistribution, Y havinga chi-square 
distribution with n degrees of freedom. We define the chance variable 
as УпХ|У Y and want to find the pdf for T. То do this, we follow the 
method described in Sec. 3.12 and introduce the extra chance variable W, 
defined as equal to X. Next we find the joint pdf for T, W, which P. 
denote by g(t,w). Since X and Y are independent, the joint pdf for X, У, 
say, f(x,y), is the product of the marginal pdf’s, ог 


f(x,y) = а ez ш) ر‎ 2-1-2 
| 2r T(n/2) 
Јоу) = 0 fory < 0 | 
Since X = Wand Y = nW?/T?, we have that the determinant J(w,t) 18 


for y >0 


Ox Ox 

— = 1 0 

д» Ot| | =e 2nw? 
ду ay|~|2nw | 
Ow дї Га EH 


We note that T and W must have the same sign, since WT — /aX2|V Y. 
Therefore, if t > 0, 


n n/2 w” w? n я 0 
йш V2aT(n/2)2"2-1 ("+1 БАр | “(1 T e ee 
=0 ifw <0 


Therefore, if t > 0, the pdf for Т, say, k(t), is equal to 


» H2 м" "( "| 
= EL d 
| Jas (n2 рта XP | 2 * a) J^" r 
and making the change of variable z = (w?/2) (1 + n/t), we find tha 
= TEn + ае £) -(n*1)2 
V anT(n[2) n 


js 
If f <0, then the absolute value of J(w,t) is —(2nw?/t), and Кш 
given by 


n/2 n 2 ; 0 
g(t,w) = = n w [ w ( al if w xe 
aste i *P | — (1F n 


=0 ifw>0 


ift >0 
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Therefore, if t < 0, K(r) is equal to 


3 ? п"? w” w? 
= = exp | яв (1 
[2nT(n[2)2??71 ("+1 2 


- y 


and we find that 


ка) = Pie + Шз, d = Md ifto 


\ү/тпГ(п/2) п 


5 chance variable with the pdf k(t) is said to have a г distribution with т 
€grees of freedom. 

A If T has а z distribution with » degrees of freedom, Table 3 in the 
Ppendix gives the value of г, for which P(—t < T < 1) = A, for various 

Values of л and А. For example, if n = 5 and А = 0.90, the desired 

Value of t is 2.015, 


sis The F Distribution. Suppose the chance variables X, Y are 

ie €pendent, each having a chi-square distribution, with r degrees of 

тою for Y, s degrees of freedom for У. We define the chance 

as cable Zas sX|rY. We introduce the extra chance variable W, defined 

Bei uM to Y, and find the joint pdf g(w,z) of W, Z. Since X = rZW/s 
= W, the determinant J(w,z) is 


rZ TW 
s s rw 
5 
1 0 


Al , 
30, X, Y are independent, and so the joint pdf f(x,y) for X, Y is 


rts 


fepe ر ق‎ ани for x > 0 and y > 0 
(eG) 
2 2 
Then =0 forx<0ory <0 
z + 
r) -= 
SAD 2 ж. жш PA: 
S(w,z) = E as (0*2) for w > 0 and z > 0 
(уг) 
2 2. 


=0 forw<Oorz<0 
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The pdf for Z, say, h(z), is, for positive z, equal to 


=0 forz<0 


к i > ith’ 
A chance variable with the pdf (2) is said to have an F distribution wit! 
degrees of freedom in the numerator and s degrees of freedom 1 
denominator, 
If Z has an F distribution with r degrees of freedom in the ! 
and s degrees of freedom in the denominator, Table 4 in the Ap 


gives the value of z, for which P(Z > z) = A, for various values 
and A, s 


numerator 
pen 1 
of 1% 


ent 
Nei The Noncentral F Distribution. Suppose X, Y are independ r 
chance variables, Y having a noncentral chi-square distribution V ^; 


а 
degrees of freedom апа noncentrality parameter m, and Y having. p of 


Square distribution with s de distribul!?, ; 
the chance variable Z grees of freedom. Then the ш lli 


: dlc n 
= sX/r Y is called a noncentral F distribut!o de- 
А "i 
m of freedom in the numerator, s degrees of freedom in К م‎ the 
ме апа noncentrality parameter. Itis impossible ios is any 
finit r the pdf for Z in any simple form, but it can be shown that! ^. ses. 
€ positive number, P(Z — 2) decreases as the parameter m inc 

4.13. Moment are joi? ys 


oa 5 in the Multivariat ажа 
а 5 ate Case. If X, У, Z, "з 
ао алев variables, we have various “mixed moments: met! 
Ex Е zn 4 ~ AZ + TY". ete In particular, the т (we 
' VOY ~ Bf Y})}, if it exists, is called the "covariance 
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Xand Y"andis usually denoted by ax. Wenotethatey,- = E(XY — 
Y EUX] — X E{Y} + E(X)E(Y)] = E(XY) — Е(Х}Е(Ү}. [foxy — 0, 

X and Y are said to be “uncorrelated.” If X and Y are independent, 

they are uncorrelated; but uncorrelated chance variables are not 

necessarily independent. 

The standard deviation of X is commonly denoted by сү, and the 
Standard deviation of Y by øp. Assuming that the quantities involved 
exist, ? x v [0 xay is called "the correlation coefficient between Y and Y" 
and is usually denoted by руу. руу can never be below — 1 or above І, 
as is shown by the following argument. For any given value u, the 
chance variable [u(X — E{X}) + (Y — Е{Ү})]? is never negative, and 
therefore its expected value, which is equal to way? + 2uc ху + oy", is 
nonnegative. But this means that the quadratic equation in u, 
ису? + 2исуү + су? = 0, has at most one real root, or else ису? + 
2uc x5 + o}? would become negative for some values of u. This in turn 
means that 4c X — 40,30, < 0, or oxy fox oy < 1, or pxy? < 1, 
SE =] Pxr < l. 

a 4.14, Multivariate Moment Generating Functions. If X;, АХ» po X, 
Jointly distributed chance variables, then E(e*iit »"5),as a 
unction "of the variables 1, ta, ..., 1,, is called the “joint moment 

бу ating function for X, Xs,..., X," and will be denoted by 

попок ii sU fiyati 1): If E(X Xy vin Xs) а 

integers IVE INCE BERS Hi, Pire e s En then If Si ба, сз aad 


д +sa+ 57 tn 


* Qt 


ЕХ а sa 
ess MSs 


х (Giles ss th, - emm 


A] | 
T We have a generalization of Theorem 1 of Sec. 4.1: | | 
Son corem. If М (a, ..., ta) is a joint moment generating function for 
Mr. Set of n chance variables, and if there is a positive value h such that 
Jor pot эһ) is finite for all sets of values (t, . . . 10) with =h < t < h 
Pon, ly... > ^, then there is exactly one probability distribution corre- 
Pp to М(ї\,... 1 y J | 
WELT CN X, have the joint moment generating function 
ng eM ty, LL, fp), then the joint moment generating function 
the me“? В equal to M 1,0... 0). In general, to find 
Meese generating function рга кибе of Yy А... ©, Xm itis only 
“Ss Crating function for a subset of Xy, Xe +++ An 


агу ү A ane в zero. 
9 Set the js corresponding to all other X s equal te 


4.15 
ү The Jo; " 
"че С uer Distribution of Certain Function ut {йай Lo 
e ariables, Suppose X Kas aco s э TB 
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chance variables, each with a normal distribution with mean и and stand- 
ard deviation o: 


W denotes 1+ te XQ 
1 n 

Z denotes 1.2 — Уу 
о? it 


We shall show that W and Z are independent chance variables, W having 
a normal distribution with mean и and standard deviation см, 2 
having a chi-square distribution with п — 1 degrees of freedom. 

As a first step, we find the joint pdf for the chance variables W, 
T= X%,—W, Te = X4—W,...,T.4— Жил — И. Solving for 
Ж.Ж... „Жу we find X, = T+ W, X, = T; 4 И... Хаа = 


Tai + W, X, =nW — (Т, +W) - (T4 + W) — +: — (Taa +W) = 
W — (Tı + T, +++: + T, а). ThenthedeterminantJ(w, fj, .. . , 1-1) 1 

Py Om | дщ 

Ow ôt Ot, 1 

1 0 

aw Oy быша |. 9 0 

% ” я = j = G 

бы "UA —1 

д» ôt Ot, i 


where C, is some value depending only on л, whose exact value does not 
concern us. The joint pdf for А, X2,..., X, is the product of their 
separate pdf's and is therefore 


ES]‏ 14-[ ا 


A little algebraic manipulation shows that 


AEREA] 


and that 


Therefore the joint pdf for W,7,,... E 5 


eee Se СЕ 
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This joint pdf can be written as the product of two factors, one factor 
involving only w, the other factor involving only 4,...,t,4. This 
shows that the chance variable W is independent of the chance variables 
T. -<-> T, 4, and therefore W must be independent of Z, since Z can be 
written purely in terms of 7,,..., T, ,. The factor in the joint pdf 
Involving wis the marginal pdf for W, and this shows that W has a normal 
distribution with mean и and standard deviation c/ Vn. 

АП that remains to be shown is that Z has a chi-square distribution 
with л — І degrees of freedom. We show this by finding the moment 
generating function for Z. Since X,..., X, are independent with 
Normal distributions with parameters и, о, the chance variables 
(Xi — wW/o,...,(¥, — w/o are independent with standard normal 
distributions, and 


has а chi-square distribution with degrees of freedom. We have seen 


that 
n (x. — uy ‘We Hm 
he Кү Lm (=== 
> ( o | oln 


í=1 


and that Z and W are independent. Therefore 


ер La (8-31 = Е{ехр (17)} Elexp LEAN 


rence f(y и) [(о] у ny? і istribution with 1 degree of 
a /n)]? has a chi-square distribution g 
(redo, (1 — 27)-*/2 = Efexp (Z) — 20), ог Efexp (tZ)} = 
= 2n-te-1/8 if yey, But (1 — 210-1037! is the moment 
roo ating function for the chi-square distribution with n — 1 degrees of 
fr 9m, showing that Z has a chi-square distribution with п — 1 degrees 
Of freedom, 


rom the discussion just completed and from Sec. 4.10, it follows that 


the Chance variable 
Ез w— и 
Jn ==] PN _ snw- 
JZ 0o - WP 


has a ¢ gic: 
а t distribution with n — 1 degrees of freedom. 
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4.16. The Bivariate Normal Distribution. If the chance variables 
X1, X; have the joint pdf 


Js) БЕР ехр | | - [Е = aj 
220,0, 1 — р? 2)1 — p?) o1 


2 
2p Xi — Uy Xa — Us | (® — s] 
O, O5 O5 
H 2 2 


where оу > 0, оь > 0, p? < 1, then the pair Xj, X, is said to have а 
bivariate normal distribution with parameters t, и», су, Ga, p. We note 
that if р = 0, then X, and X, are independent, X, having a normal 
distribution with mean и, and standard deviation o, and X; having а 
normal distribution with mean из and standard deviation o>. 

Now we find the joint moment generating function for X;, Xa. 


o fo 
M(t,;) = Efe ХХ) -Í | ehoi*tsz f(x xy) dx ахь 


Substituting the function f(x;,x;) as given above, and then making the 
change of variables 


5 1 xı — u 1 xà — uy 

V2 а 42. чоь 
р ak Xı— Uy , 1 x;— us 

Dx re ыан: 
2 © 2 c. 
we find y B v x 
ма &) glia t tous ii | р ji 
p = exp | — (ot + 01) — zl av} 
жп = pt aw Б eh) y رو‎ 


* (Г Ё E (0,1; — Ozta) aS zl aw) 


Each of the integrals in this expression can be evaluated by the method 
used to find the moment generating function in Sec. 4.6. The final result 


is Мы») = exp [uty + и» + (44)(0,21,2 + 0t + 2ро10»111)). From 
this, we find 5 
ЕХу} = дп M(t5)], „= X 


8. 


E{X,} = 5 М(11,)], 2-0 = us 
ty * 


o? 
E{X,?} = 5,3 МО), о = o + u 
1 
E{x,} = Č à 
Uer 91.2 Ма], о = о + и 
2 


o? 
E(X,x,) = Wo EE М(1,)],, =u =o = 0,05 + иш» 
3 2 
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; Recalling the definitions of variance, covariance, and correlation соећ- 
cient, we see that o, is the variance of X4, oa is the variance of Xa, and p is 
the correlation coefficient between X, and №. 

Setting ть = 0 in M(t,t5), we find that the marginal distribution for Ху 
is normal with mean ш and standard deviation оу. Setting fı = 0 in 
M(tı,ta), we find that the marginal distribution for X, is normal with mean 
из and standard deviation о». 


4.17. ‘‘Length-of-life’? Distributions. Suppose the chance variable 
X is the total length of life of a piece of equipment about which the 
following assumption is made: the conditional probability that the 
equipment fails (“dies”) during the time interval (t, £ + At), given that it 
has not failed before time г, is equal to r(r) At + q(1, Ar), where r(r) is a 
8lven nonnegative function of г, and g(t,Ar)/At approaches zero as At 
approaches zero, uniformly in /. Time is measured from the moment of 
"birth" of the equipment. Under this assumption, we are going to find 
the cdf and pdf for X. 

From the description of the problem, it is clear that P(X < 0) = 0. 
Let x be a fixed positive value. We compute Р(Х > x) as follows. 
Break the interval (0,x) into x/Ax subintervals, each of the subintervals 
having length Ax. Clearly, P(X > x) = P (no failure in first sub- 
interval and no failure in second subinterval -  - and no failure in last 
Subinterval), But by our assumption, P (no failure in ith subinterval | no 
failure earlier) = 1 — r[(j — 1) Ax] Ax — q(G — 1) Ax, Ax). Therefore 


P(X > x) = Пи — r[(i — 1) Ax] Ax — qi — 1) Ax, Ах)} 
Taking logarithms, we get 
log P(X > x) "ов {1 — r[(i — 1) Ax] Ax — q((i — D Ax, Ax)) 
Expanding each logarithm in a Taylor series, we find 
log P(X > x) = S [G — 1) Ax] Ax + QA) 


Where Q(Ax) approaches zero as Ax approaches zero. Letting Ax 
арргоасћ zero, we find 


log P(X > x) — _[ r(x) dx 
Or 0 


P(X > х) = exp Го x] 
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or 


Р(Х < х) =1 — exp Bk ax] 
0 


This gives the cdf for Y. Тһе pdf for X is r(x) exp Е ах |. 


0 

The cdf and pdf that we have found depend upon the function r(x). 
Looking back at the way r(x) enters the problem, it can be seen that r(x) 
can be interpreted as the “death rate" at time x. It would seem that in 
most practical applications, r(x) should be an increasing function of х, 
atleast if x is large. This is true for human length of life. However, for 
lengths of life of vacuum tubes and electric-light bulbs, it is often assume 
that r(x) is a positive constant, say, 0. In this case, the cdf and pdf are 
1 — e™ and 0e”, respectively, for х > 0. Naturally, the pdf and cdf 
are both equal to zero for all negative values of x. This distribution 15 
known as “the exponential distribution with parameter 0.” 
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point (2,29, ...,2,) in C, we have by, + baya + *** + buy < Булу + 
Буль je E bum 

Proof. We shall carry out the proof only for the case n — 2. The 
basic idea of the proof is the same when л > 2, but the required notation 


becomes cumbersome. Forn = 2, we represent the situation in Fig. 5.2. 


љо & Ш 


Convex sets 


Nonconvex sets 


Fig. 5.1 


In Fig. 5.2, we have drawn axes through the point (ууу), labeled the 
axes x, and х», respectively, and labeled the quadrants in the usual мау, 
noting that the quadrant Q, is the same as the set of points Q(yi,Y2) | 

Next we draw a line Ls, starting 


2 at (ууз) and going into Оз, with the 
following property: there are ПО 
Q, о, points of Cin Q, and to the left of Le» 


but if L, were rotated clockwise 
however slightly, there would 5 
points of C in Q, and to the left © 
La. [If the vertical line through 
(узу) has no points of C to Its left, 
then it is clear that y, < zi for each 
and every point (z,,2,) in С. This 
means that the theorem is true 
1 with bı = 1, b, = 0. Therefore we 
Q, Q assume for the remainder of the 
^ discussion that Ly is not vertical.] 
Next, we draw a line Ly, starting 
Fig. 5.2 at (у, у) and going into Ол, with rs 
P " followi : there are 
points of C in Q, and to the left of Ly, but Pe dul counterclock- 
wise however slightly, there would be points of C in Q, and to the left 


mx 
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of L,. [If the horizontal line through (y,y») has no points of C below 
it, then it is clear that у» < т» for each and every point (2,22) in C. 
This means that the theorem holds with Ьу = 0, Б, = 1. Therefore 
we assume for the remainder of the discussion that L, is not horizontal.] 

We now have Fig. 5.3, where 0, is the angle between La and Ly. We 
show that 0, cannot be less than 180°. For suppose it were, as in Fig. 5.4. 
Then, if (pape) is any point on La distinct from (уу,у»), and (919) is any 
Point on L, distinct from (узуу), the line segment joining (рур) and 
(41:92) would contain points of О(ул,у). But there is a point (PoP) in C 


Fig. 5.3 Fig. 5.4 


and either on L, or arbitrarily close to Lz, and a point (4.92) іп Cand either 
9n Ly or arbitrarily close to Ly, because of the definition of Lg and Ly: 
€n the line segment joining (;,2) and (4,72) contains points of QU), 
Bus all the points on this line segment are in C. This nem our 
Ssumpti E E that 0, is at least 
180° Pion that no point in О(ул,уз) is in C and proves 3 

; Since f) is at least 180°, we see that if the line Ls is extended indefinitely 
is Oth directions, no points of C are to the left of this extended L,. The 
uation of the line L, can be written as aX, + 40 + d = 0, for some 
constants ay, аз, d. Since the slope of Le is negative, a; and a, have the 
Sàme Sign, and a, + a, is not equal to zero. Then we can write the 

Equation of Ls as 
a, а» —d 


1 
хә 
a, + dg a, + da 


Ti 


x 
а + а 
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Denoting а,/(а, + aa) by by, as[(a, + аз) by bs, —d/(a, + аз) by k, the 
equation of L, can be written b,x, + bx, = k, where b,, b are non- 
negative, and b, + b; = 1. But no points of C are to the left of La, and 
therefore if (z,za) is any point of C, bz, + baza > К. Since (уу) 15 on 
Ls, we have by, + bays = k < bızı + beze, completing the proof of ће 
theorem for the case n = 2. 

We note that the values of (5,,5,) depend on the particular point (Q9) 
we are dealing with. Also, for a fixed point ( y,,y;), the values of (biba) 
may or may not be uniquely determined : they are uniquely determined if 
and only if the angle 0, is exactly 180°. | 

As an example, if the set C is a circle and its interior, the line La 1$ 
simply the tangent to the circle at (y,)). As another example, if C is à 
Square and its interior, and the point ( уу,у») is the lower left-hand corner 
of the square, L, can be the left boundary of the square, or the lower 
boundary of the square, or any line we get by rotating the lower boundary 
clockwise around (ууз) through less than a right angle. 


5.2. Description of the General Problem of Statistics. The typical 
problem that a statistician is called upon to help solve may be briefly 
described as follows. X, ..., Y, Yp... , Y, is a set of m + n jointly 
distributed chance variables. Somebody has to choose one of a given 
set of possible decisions or actions after having observed X, . . -> X, but 
before observing Y,..., Y,. It is known that the joint probability 
distribution of X, ..., Xm, Yp... Y, is one of a given set of possible 
probability distributions, but exactly which distribution is not know": 
After a particular decision is chosen, Y, ... , Y, will be observed, anda 
loss will be incurred which depends on the decision chosen and the 
observed values of Xy oues Xm, Vann Жы A profit is regarded as a 
negative loss.) The problem is: which decision should be chosen after 
observing Xy ..., Xn? | 
_ Asanexample of sucha problem, suppose a company is formed which 
intends to build a factory to manufacture a new type of perishable food: 


stuff. The question is what the productive capacity of the factory shor r 
be. If the capacity is small, the factory will not cost much to build © 
operate, but then potentia 


| profits will be lost if demand for the product P 
greater than the capacity of the factory. If the capacity is large: 
factory will be expensive to build and Operate and may be idle a good pat 
ru time if its capacity exceeds the demand for the product. In order 
E ер the pitfalls of over- ог undercapacity, it is decided to run a ёч 
Fo Free the potential demand for the product. This -— 
Ro» es Oosing certain persons at random and finding out how т jin 
the ae гак each would buy. In addition, the trend of populatio! 

» taxes, the effect of advertising, and other quantities may 
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е. Here the possible decisions are the possible capacities of the 
ima. e chance variables X,,..., X, are the various quantities 
P red x the survey, and : EE Y. are the demands that will be 
Debs pd the factory is built; that is, Y, is the demand that will 
li taken i in the ith accounting period after the factory is built, and л 
Шера og enough so that the л periods cover the life of the factory. 
ees ssible joint distributions of Xy, ..., Xm У...› Y, need not 

n us at present, although people experienced in running surveys 


presumably know what sort of joint distributions apply. 


a EM. m Discussion of Statistical Problems. The description of 
given de problem given in Sec. 5.2 differs from the description usually 
the ine + important respect. In the usual statement of the problem, 
epends on the decision chosen, on the joint distribution of 
Deres Xm and on the observed values of X1,..., Xm but not on 
с Y, at all; in fact, the usual problem does not even mention the 
Problem; of Y,..., Y,. However, it is difficult to think of practical 
example. let which the usual formulation would be reasonable. For 
uild M. us go back to the problem of deciding what size factory to 
factory js is clear that what really determines the profitability of the 
А not the distribution of Xy, . . . , Xm but rather the values of 
is to st Indeed, the only reason we bother observing Xs, .-- 5 Xm 
to be, earn what the values Y;,..., Y, may reasonably be expected 
un nd reason for preferring the formulation of Sec. 5.2 to the usual 
Could EN is the difficulty of imagining just what mechanism would or 
May x Ose the actual joint distribution of X1, . . -> X m to us, so that we 
“game” w what loss we have incurred. Except for certain artificial 
tion susp OS where somebody knew what the actual joint distribu- 
for the | ut deliberately withheld the information until the time arrived 
Oss to be incurred, any disclosures about the joint distribution of 
Variabio, ү would be made by means of observing additional chance 
third b., Y, which are jointly distributed with: X3, .. «9 Xs 
Ormulati reason for preferring the formulation of Sec. 5.2 to the usual 
not а is that the usual setup is a special case of the setup of Sec. 5.2, 
Once iis «5 Versa. To see this, note that we could arrange things so that 
bs values of Y,,..., Y, are observed, the joint distribution of 
simply Xn is completely known. The artificiality of this procedure 
п Spit ints up the artificiality of the usual formulation of the problem. 
Statistica] ofthe objections we have raised to the usual formulation of the 
that form DN, most of the published research and discussions use 
A Point hot and we return to it in Chap. J 
is worth stressing is that what makes st 


Ibs. 


atistical problems 
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particularly difficult is the fact that the joint probability distribution of 
XV... Ks У... Y, is not completely known. Problems where 
this joint distribution is completely known are not considered statistical 
problems and are conceptually much easier to handle than statistical 
problems, though the computations required may be formidable. In 
many problems where the joint distribution is completely known, 111$ not 
necessary to observe X, .. . , Xm before choosing a decision. Problems 
of this type appear in the exercises of earlier chapters. 


5.4. Decision Rules. The statistician’s role is usually not that of 
personally choosing a decision; the choice is a prerogative of the person 
or persons who are actually going to incur the loss or make the profit. 
The role of the statistician is to act as an adviser to the decision maker 
Suppose a statistician is acting as adviser to a businessman who has tO 
decide whether or not to open a branch office in a certain town. Perhaps 
the businessman has a strong aversion to the town, for reasons that have 
nothing to do with the desirability of opening the office there, and the 
Statistician may be completely unaware of this aversion (even Oe 
businessman may be largely unaware of it), The effect of the aversion 
might be to make the businessman decide against opening the office, even 
in the face of the strongest evidence that opening the office would 6 
desirable. How can the statistician avoid this unfortunate situation - 
(Of course, the businessman and not the statistician is the one who WH 
suffer from a poor decision, but the statistician may well fear that his 
reputation would be damaged if he were associated with a disastrous 
decision.) The standard way of avoiding such a situation is for the 
statistician to insist that the businessman specify, before Хі, -+> Xm are 
observed, which observed values would cause him to decide to open HE 
office and which observed values would cause him to decide not to oper 
the office. Presumably, the businessman would recognize the irration" 
ality of specifying that he would not open the office even though ! 
observations showed that it would be very profitable to do so, if he wer’ 
made to State what he would do before observing the values of Xs fe 
However, if he weren't forced to state his intentions before observing t 


values of X,,..., Xn, he could always claim that the values of Xr: 7; 


actually observed simply weren i justi ening 
t enco ustify ор 
the altis. ply uraging enough to justly 


Thus a rational anal 


stician 
obtaining fi Н E 
ning from his em 


ysis of the problem starts with the statis 
ployer a list showing which decision WO a 
pem for each Possible set of observed wales OP X, ong Жа i 
Istis called a “decision rule," The statistician's major task is (0 Us 


t М m ; jon 
the goadness of any given decision rule, relative to other possible decis 
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In the c 
ase i 
орешап Salis businessman and his problem of whether or not to 
uites рррозе Ks =: Жи Bre the demands in dollars for the 
product of m persons chosen at random in the town. 


m 


One ; - 
possible decision rule is to decide to open the office if X X; > 
i=1 


$10,000; i 

»900; ош A i 

nire La cupis not to open the office. A different possible decision 

Valeo Y e to open the office if at least one-quarter of the observed 
1:... , X, were above 525. А great number of other possible 


т 


decisi 
tsion rules clearly exists. 


5.5. Decisi 
ес ол miler Rules Using Randomization. In Sec. 5.4 we defined 
set of observed s the assignment of a particular decision to each possible 
this definitior values of Х,..., Xm. Weare now going to generalize 
Probabilities В г allow a decision rule to be an assignment of a set of 
Observed pr choosing the various decisions to each possible set of 
type for ihe es of Xi, <, Xm A decision rule of this more general 
m problem discussed in Sec. 5.4 is: Decide to open the office if 


2 Xs 

(S o dissi ji 

í 10,000; decide not to open the office if X X; < $10,000; if 
i=l 


$x, 
gon S= asd "m s 
1 0,000, assign probability 14 to opening the office and 


Probability 34 to n E " А : 
not opening the office. This means that if the sum of 


© observati 
ecision на equals exactly $10,000, the businessman chooses his 
the office. Thi, а random device that assigns probability 14 to opening 
cision he decision rule is said to use “randomization.” 
Peculiar, апа ww" which use randomization strike most people as very 
Overtake the ү TIRED pictures can be drawn of the fate that would 
managers of a corporation if the stockholders found them 


есїйїп 

8 polic : ; ; Fe" : 
all see a by tossing coins or rolling dice or spinning spinners. We 
troduction a that randomization is actually used only rarely. The 
Пас, randomized decision rules serves to simplify the mathe- 


al development. 


5.6 N 
- Notati Р 
ation for Statistical Decision Problems. A convenient 


Syste 

m of notat; ; 

Shall meet m is а necessity for handling the Variety of problems we 
‚ апа so we shall develop our notation rather carefully. There 
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€veral thi 

“4 thas ings to keep rack of: the different рох е Јоце о 
Ue 2X4, Fess Y, the different possible sets of values of 


Neat Xm Y, 2 ; ts 
th: shall aa 2 „запа the different possible decisions. I 
а he possible joint distributions; 

ticular distribution. In the 
ble joint distributions, 
and let Orange over 
are dealing 


u ii i 
л hes the symbol 0 as an index fort 
icular value of 0 picks out a par 


üse 
ere th rei f ber of pos 
e 1 и i 
15 only a finite num 0551 


Say, / 

of of 

the; then А ! 

и Integ, n, we can list them in some definite order 
0 = i, it means that we 
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е 
rs from 1 toh. Then, when 
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i ith distribution in our list. There are more complicated cases. 
скы the X’s and Y’s might be known to be independent d 
variables, each with the same Poisson distribution, the parameter Ат 
Poisson distribution being unknown. _ In this сазе, 6 will Me eni 
the parameter of the Poisson distribution and will range over a Pin ie 
numbers. Thus, when 0 — 2.6, it means that we are dealing wi 

isson distribution with parameter 2.6 
ко the symbol x e an index for the possible sets of занги 
X,,..., X,. Thus x ranges over certain нуе CUNEN ae E a 
We use the symbol y as an index for the possible sets of a e 
Y,..., Y, Thus у ranges over certain points in ME s e к = 

We use the symbol D as ап index for the possible decisions; t ome 
particular value of D picks out a particular decision. In the case Жо, 
there is only a finite number of possible decisions, say, L of them, ч al 
list them in some definite order, and let D range over the integers Iro E 
to L. Then, when D — i, it means that we are dealing with the 7 
decision in our list. There are more complicated cases. For ме, 
we shall come across cases where any value between 0 and co is a pos 
decision. Then D ranges over all nonnegative numbers. А . the 

The loss we incur when the values of X1, ..., Xm аге given by “al be 
values of Y, ... , Y, are given by y, and the decision chosen is D is a6 
denoted by W( Dix) In many cases, the loss depends only on р). 
D, and not explicitly on x. In such a case, we write the loss as ИШ 

When the joint distribution corresponding to 0 allows us to lis 7) 
possible values of the chance variables with their probabilities, fJ ? у 
denotes the probability assigned to the т + п dimensional point ee 
this distribution. When the distribution corresponding to 0 has à gn 
Pdf, f(x, y:0) denotes the value of this joint pdf at the m + п dimens! 

oint x, y. B 
и А decision, rule s is defined by nonnegative numbers (р;х), when 
5(D x) is the probability assigned by the decision rule s to pue on 
decision D when the point x is observed. Thus, when D can ta 
only L different values, Say, 1, 2,... , L, we have 


L 


2 XDx)-1 for each x 
pæl 


For each given decision rule s, the loss that will be incurred when e 
s is a chance variable whose probability distribution depends upo * 
unknown joint probability distribution of Aussie. Waa = cision 
The expected value of the loss that will be incurred when using the de ill be 
rule s and when the joint probability distribution is given by 9 W 


l d the 
denoted by r(0;s). 7(055) is often called “the risk when using 5, 2? 
true distribution is given by 0.” 
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If ; . , Р z Р 
баяр - dealing with a problem in which there is a finite number L of 
Boni € decisions and the joint distribution given by 0 allows only a finite 

er-pf possible т + п dimensional points х, y, then 


L 
(03s) = Y Y У Woy: Dsx)f(x,y39)s( Dx) 

- у D-1 
Where x and y in the summation run over all possible sets of values of 
this e Ы m and Y,,..., Y, allowed by the joint distribution. To see 
viles af 4 it is necessary to note only that for a given 0, the possible 
the prob are the values W(y;D;x) takes as y, D, and x vary, and 
(x,p:0) KA that the value of the loss will be W(y;D;x) is equal to 
that D n on which is the probability that x, у will be observed and 
Ollows fr. € chosen after x is observed. Then the formula for r(0;s) 

Pun E the definition of expected value. | 

decisio ге dealing with a problem with a finite number L of possible 
ns, and the distribution corresponding to Ohasa pdf f(x,y;0), then 


r(0;s) an ы 4 L 
— У (у; Dix)f(x.y:0)s( Dix) dx, "dx, ayy” dy, 
© vus Dn 


Where у у; 
respectively і" the integral denote vectors X1, . .- 
the ones this section with two simple numerical examples, to illustrate 
Exam pts introduced above. а 
equipme l. We have to decide whether or not to buy a certain piece 
NOt, and j ent for $500. The equipment may turn out to be defective or 
3 install 91. Buaranteed. The alternative to buying this equipment IS 
CCiding different type of guaranteed device for $1,000. hos 
“quipment © shall have the opportunity to observe two similar pieces 0 
‘urneq out Produced by the same factory. The proportion of defectives 
turns out by this factory is unknown. If we buy the equipment and it 
additi to be defective, we shall install the guaranteed device at an 
ста Cost of $1,000. | 
define produce the following notation. The chance variable Y is 
defecii 9 be 1 ifthe piece of equipment we are considering turns out to be 


Simi ier апа 0 otherwise. X, is defined to be 1 if the first of the two 
Hist S of equipment to be observed turns out to be defective, and 0 
i Xs is defined the same way in terms of the second similar 

nknown proportion 


ipment to be observed. 0 denotes the U 1 
ucing this equipment. Of 


pendent chance variables, 


Xm and yy +++ Vw 


Se, 
ie of equ 

efecti 
Course I ver turned out by the factory prod 
Sach With < б< 1. Then X,, Xs, Y are inde 
the following probability distribution: 


Possible values 0 1 


Probability 1-0 0 
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We label the decision to buy as decision number 1 and the decision to use 
the guaranteed device as decision number 2. The loss function does not 
depend explicitly on X, or X,, and is seen to be as follows: Р 


W(0;1) = 500, W(1;1) = 1,000 + 500, W(0;:2) = W(1;2) = 1.000 
Then, using our formula for r(0;s) above, we have 


r(0;s) = [500(1 — 0? + 1,5000(1 — 0)*]s(1:0,0) 
+ [5000(1 — 0)* + 1,5000*(1 — 0y][s(1:0,1) + sC1:1.0)] 
+ [5000*(1 — 0) + 1,5000?]s(1:1,1) 
+ 1,000[(1 — 0)? + 0(1 — 0)°]5(2;0,0) 
+ 1,000[0(1 — 0° + A — 0y][s(2:0,1) + s(2:1.0)] 
+ 1,000[6°(1 — 0) + 69]s(2:1,1) 


As a particular case, suppose s, is the decision rule with s,(1:0.0) = E 
510150,1) = 5(1;1,0) = 12, s(1:1,1) = 0. Then, recalling that s,(23%) = 
1 — s(1;x) for all x, we find r(0;s,) = 500 + 1,5000 — 1,0000%. AS 
another case, suppose s; is the decision rule with s,(1;0,0) = 0, sal :0,1)= 
sa(1;1,0) = 1$, sa(1;1,1) = 1. Then we find that r(0;5,) = 1,000 — 
5000 + 1,00002, 

Example 2. А company has to decide on the price it will charge fora 
certain piece of equipment. It feels that it can charge $10,000 plus 10 
times the length of time it guarantees the equipment (time is measured in 
months). If it guarantees the equipment for D months, it refunds the 
purchase price if the equipment fails before D months pass and refunds 
nothing if the equipment lasts for at least D months. It is felt that е 
length of life of the equipment has an exponential distribution wilt 
unknown parameter 0 (Sec. 4.17). Before making its decision, tPe 
company will observe the length of life of three similar pieces of equlP" 
ment. 

We introduce the following notation. Y is the length of life of the 
guaranteed equipment, in months, Xy, Xo, X, are the lengths of life © 
the three similar pieces of equipment that will be observed. Eo X» Xo 
are independent chance variables, each with pdf equal to 0e "for ' ? 
zero for x < 0. Dis the length of time the guarantee runs, їп months. 
The loss function does not depend on X,, Х,, X,, and is seen to be ad 
follows: ка 

WG:D)-— — (10,000 + 1000) ify- D 
W(y;D) =0 ify<D 


i n e A ; ible 
d We note that in this problem there is an infinite number of pos? ber 
ecisions, and our formulas for r(0;s) given above assume a finite num 
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of possible decisions. However, given a decision rule s, it is a simple 
matter to compute r(0;s). Asan example, suppose our decision rule Sy 5 
toset Dequal to Y, + X, + Ху. This means that 5(D3x1,%2,%3) is equa. 
to 1 if D = ху + Xx» + ху and is equal to 0 otherwise. Then we get 


быз Г [ГГ 
3 “0 «40 Jo 


“0 


* Опет eren ns 10,000 — 1006 + х + x9] | dx, бхз dx, 
MILL . : 
To see this, we go back to our formula for r(0;s) and note that since Lt 

Xed function of X1, Xs, ху, the Y operation disappears, and we simply 


Teplace Dbyx, + xy 4 xin the computation. When the integration is 
Carried out, we find 100) 1,250 — 300/160. 


i : т our 
ac The Comparison of Decision Rules. | It is ii го ien 
; SCussion so far that in each statistical decision problem 


Infinite], Е -characterizing à 
2 Nitely Many decision rules and that some criterion for chara 8 


е : 
eision rule as "good" or "bad" is needed. ian rule s “good” 
wh oughly speaking, we are going to consider a decision ia Qo 
en r(0;s) is “small” for all possible values of 0 in the pro р 


(0:5) is distribu- 
(i) is the expected loss when the decision rule s is used and the 


ti duis "ed an unreason- 
E. COrresponding to 0 is the actual distribution, this is not an Ё 


i j i ider л(0;5 
чыз Criterion for calling a decision rule good. We wee y es 4 
= Possible values of 0 because we do not know which dis 

be mor iff decision 
© be more precise, suppose we are considering c ie 
SD: % апа s, characterized by the decision er p a 
0, wn respectively, Suppose r(0:5,) - 0х) for a | р гта A 
“кы 5) < r(0;s,) for at least one value off. Then ийа" rw e 
Tule : decision rule than s, and we would certainly vas a ledon 
Tule Which decision rule r is called “inadmissible” 1 


ape "better" just 
je Я = nition of "be h 
iven h is better than г according to the дей! le | of Sec. 5.6, 51184 


rule 


etter Or example, it can be verified that in Ехатр! хе specifies that 
the o (cision rule than зу. This is not surprising, since 3e ved 
е egu; e than sy. his 1S are obser 


Ч я lane ipment 
10 be "pment is bought if both similar pieces of ag ae TS Any 
есі efective, Thus Sy in Example 1 of Sec. 5.6 18 БЕТЕ * [tshould 
bee Tule Which is not inadmissible is called щш. А з beenshown 
to be b “Sized that just because s of Example І a. ; admissible: there 
тау “ег than sy does not necessarily mean that sı 5 
е е 2 ИС 
€cision rule s, which is better than S n admissible 
decisi “tever decision rule is finally used should exten sha method for 
finding. rule, and therefore it would be useful to develop 
Admissible decision rules. 
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For the remainder of this section, we shall assume that our d 
problem contains a finite number л of possible joint probability bibite : 
tions for X, ..., Xn, Y, ..., Ya so that 0 may be assumed E д 
over the values 1, 2,..., h. Then any decision rule s has —— i ^ 
it the л risks r(15s), r(255), . . . , r(A3s). These risks can be plotte E 
point in h-dimensional space. Thus any decision rule s has ee om 
with it a point in -dimensional space. We denote by C the des a s 
points in /-dimensional space associated with some decision rule. E 
shall show that C is a convex set, by the following argument. a oe 
Wy, +++, Wy and Z,...,2Z, are any two points in C. Then pe di 
decision rules s and г, such that л(0;5) = и, and r(0;r) = Zo, for wa 
1,...,4. Given any value q between 0 and 1, we define a m s 
1, Whose decision probabilities are щ(Ю;х) = qs(D3x) + (1 — q) Ke 
Using the equation given in Sec. 5.6 for the case of a finite а: (ihe 
possible values for the chance variables X;,..., Xm, Y...» Yn 


case where there is a joint pdf is handled analogously), we have 


L 
(0ш) => 2 2, W(y3D3x)f Gy ;0)u(D;x) 


=} < Ў W(y; Dx) f(x,y30)[gs( 3x) + (1 — q)((D:3)] 
= у р=1 
L 


Ор? W(y;D;x) f(x,y30)s( D;x) 


ту D=1 
1, 
+= ФУ У YW(yDx)fGuyQ(D:;x) 
х y D=1 
= 410055) + (1 — qyr(0;t) = qw, + (1 —q)z бог0 = 1,...: 1 


Thus the /i-dimensional point qw, + (1 — 4) z, ..., qw, + (I = ы 
in С, since this point corresponds to the decision rule д. Ee line 
that C is a convex set, since as q varies between 0 and 1, we get the 
segment joining the points wy, . . . „ Wj AN zy. i; guys ‚ее 

in the terminology of Sec. 5.1, it is clear that if s is an ا‎ 
decision rule, no pointin Ccan be below the point r(1 ;5), (2 35), el bers 
But then by the theorem of Sec. 5.1, there are Л nonnegative num any 
b(1), ... , b(h), with b(1) + b(2) + +++ + b(h) = 1, such that if 1 15 
decision rule, 

BDC ;s) + b(2)r(258) + ++ + + b(hr(h;s) 4) 

< b(Dr(1;) + b(2)rQt) + o H BEANS 

We have proved the following theorem: h 

Theorem. If s is an admissible decision rule, then there are П 


negative numbers b( 1), . . . , b(h), with b(1) +--+ + b(h) =1, such 
for each and every decision rule t, 


BUYS) + +++ bh)r(hss) < BOCA «+» 4 DrD 


non- 
that 


E 
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oo b(1), ... , b(l) depend on the decision rule s and may not be 
E numerical example, we take a case where Л = 2, so that we can 
ma em set C. We simplify Example 1 of Sec. 5.6 by assuming that the 
Щщ ш of defectives turned out by the factory is known to be either 
Bx 2. This requires a change in the interpretation of 0. Now when 
Fis it will mean that the common probability distribution of №, s 


Possible values | 0 1 


l4 3934 

и 94 

and Pz canis EPISC 
When 0 = 2, it will mean that the common probability distribution Is 


Probability 


Possible values | 0 1 
Probability 15 14 


Th " 
en, using the equation given in Sec. 5.6, we have 
Wlsy. 
39) = (27,000/64)s(1 0.0) + (9,000/64)[s(1 :0,1) + s(131,0)] 


+ (3,000/64)s(1:1,1) + (36,000/64)[1 — s(1;0,0)] 
+ (12,000/64)[2 — s(1:0,1) — s(1:1,0)] + (4,000/64)[1 — s(1;1,0] 


rss) چ‎ а 4 

=? 591.008. for any. s х { С consists of the line 
Segm 990, for any s. In this case the se 

" Шш With r(2:5) = 1.000, r(1;5) ranging from 48,000/64 to 64,000/64. 


dmissible decision 


e Segm : i 
rule ; ent is a only one а 
ule j convex set. There is only ) = 1 for all 


Possible 1 Problem: it is the decision rule sı with (Lx 
майы ү Lien r(1;5) = 48,000/64, (2;9) = 00 it did in the 
Ceding the convex set C does not degenerate into a line, needing 
’ example. 2 ample, we modify the 

ple. As another example. vested auff the 


°хатр] 
е ` . 
ple by assuming that the proportion of defe ee will meam 


nown to be either 14 or?4. Now when 0 j 
t h n to be either И or 74. п 
5%, “Proportion is 14, and when 0 = 2it will mean that the proportio 


Я en using the equation given in Sec. 5.6, we have 
Чё ы (27,000/64)s(1:0,0) ++ (9,000/64)[s(1:0,1) + (1:07] 

+ (3,000/64)s(1:1,1) + (36,000/64)[1 — 51 :0,0)] 4] 
+ (12,000/64)[2 — s(1:0,1) — s(1:1,0)] + (200016901 шо 
(5,000/64)5(1 0,0) + (15,000/64)[s(1;0.1) + 5(1:1.0)] 

+ (45,000/64)s(1 1,1) + (4,000/64)[1 — 5(1:0.00] ei 
+ (12,000/64)[2 — s(1:0,1) — (1:1,0)] + (2600/6011 “йы 
) уагу independently 
1 points [r(155), 259) 


Озу 


Letti 

In 

between 00150,0), s(1;0,1), s(1;1,0), and 50131 
and 1 and plotting the two-dimensiona 
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resulting, we get Fig. 5.5. The circled points represent decision rules 
which do not use randomization. {A decision rule s which does not use 
randomization is one for which s( D;x) [1 — s(D;x)] = 0 for all vatues of 
D and x.) The following table gives r(1;s) and r(2;s) for all decision 
rules in our problem not using randomization. 


ee 


5(1;0,0) 5(1;0,1) 5(1;1,0) 5(1;1,1) r(13s) | r(2:5) 
1 1 1 1 48,000/64 80,000/64 
1 1 0 1 51,000/64 77,000/64 
1 0 1 1 51,000/64 77,000/64 
1 0 0 1 54,000/64 74,000/64 
1 1 1 0 49,000/64 71,000/64 
1 1 0 0 52,000/64 68,000/64 
1 0 1 0 52,000/64 68,000/64 
1 0 0 0 55,000/64 65,000/64 
0 1 1 1 57,000/64 79,000/64 
0 1 0 1 60,000/64 76,000/64 
0 0 1 1 60,000/64 76,000/64 
0 0 0 1 63,000/64 73,000/64 
0 1 1 0 58,000/64 70,000/64 
О 1 0 0 61,000/64 67,000/64 
0 0 1 0 61,000/64 67,000/64 
0 0 0 0 64,000/64 64,000/64 
„ез Se di il el hee 


„Note that although 16 decision rules are listed, they represent only 12 
distinct points on the diagram. Also note that the boundary of the 
convex set C consists of line segments joining points corresponding to 
decision rules not using randomization. We shall prove later that this 
is always the case, and thus a convenient way to sketch the set C is to plot 
all the points corresponding to decision rules which do not use random! 
zation, and then draw line segments between pairs of these points in 
such a way that each of the points is either on one of the line segments or 
else contained inside the boundary formed by the line segments, and the 
line segments enclose a convex set. These line segments then form the 
boundary of the convex set C. The shaded part of the boundary in the 
diagram represents the admissible decision rules. 

Asa third example, we simplify Example 2 of Sec. 5.6 by assuming that 
the unknown parameter is known to be equal to either 0.02 or 0.04 an 
that D must be one of the two values 40 or 60. It is then convenient ra 
introduce a change in the interpretation of 0. Now when 0 = 1, it wil 
mean that the common pdf of Xj, X», X, Y is 0.0290“, and when 0 = 2, 
it will mean that the common pdf is 0.04e~%-"", [n this case the convex 
set Cis given by Fig. 5.6. The computations giving the boundary of Cin 
this case are tedious, and will be discussed in Sec. 5.12. Each point шщ 
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the b 
oundary 
rand : Ly of C corr 
de domization, Th esponds to a decision rul i 
Cision rule ere is only one point co Brot does not use 
$ 2 rresponding to an n 
g admissible 


-2150  -1650 


Fig. 5.6 


5.8 
`б. Ва 
° yes Decisi 
cision Rules. ТЇ g i i 
. hroughout this section, We continue 


Ur as 
b(1), pa Pt 
ИТ On that there ar e 
Ung ii «єз ыр) оше different possible joint distributions. If 
rely 1 1. then : Fo of nonnegative numbers with b(1) + b(2) + 
a decision rule s is called a “Bayes decision rule 


), b(2),... bu)" if 


h 
У (00:5 а 
for each and б ›(0)г(0:5) 2 AES 
that ej еу decision rule / 
ei 2 rülé.ss . 
Ti 5 illc a 
(л), wi € IS some Ace ae simply a “Bayes decision rule,” it means 
in ed set of nonnegative numbers (1). B(2), ++ +> 
- = b(h) = 1, such that $ is a Bayes decision 
ble decision rule must 
ch for admissible 


es decision rules. 
which are 


at any admissible 


атор Ulei 
1 Sa Вау 2. 
hey Seen Sec. оман гше. For example, in the first illustrative 
Wi that every д convex set Cis a horizontal line segment, and it is 
y decision rule is a Bayes decision rule relative t 0.1. 
Therefore it may be 


er t 
» there j 
sonly 
oneadmissi 2 
) ne admissible decision rule. 
s decision rules, which 
hat 


Ma re 
ay į мү 
їпєїнд„ У We b : 
ude EN to pay attention to Bayes 
5 them some inadmissible decision rules, when W 
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we would really like to find are admissible decision rules. The answer is 
that it is so simple to find the Bayes decision rules that it is a useful first 
step in our search for the admissible decision rules. o 

If s is a Bayes decision rule relative to b(1), . . . , (4) and b(0) > 0 for 
0 =1,...,h, then s must be admissible. This is proved as follows. 
Suppose s were not admissible. Then there would be a decision rule f 
with r(0;r) < r(0;s) for O = 1,..., h, with r(0;t) < r(05s) for at least one 
value of 0, say, for 0 = j. But then 


р b(0)r(0;s) — У b(0)r(0;1) = x b(0)[r(0;s) — r(O3t)] 
8-1 0-1 0-1 


> bGyrG:s) — r(j;1)] > 0 
and therefore 


h h 

> b(0)r(05s) > Y b(0)r(0;1), 

0-1 0-1 
which contradicts the fact that s is a Bayes decision rule relative to 
b(1),...,b(h) This contradiction proves that s is admissible. 


5.9. The Construction of Bayes Decision Rules When There 15 2 
Finite Number of Distributions and a Finite Number of Possible Decisions. 
First we consider the case where each possible distribution allows u$ to 
list the possible values of the chance variables. Then, for the decision 
rule s with decision probabilities s(D;x), we have from Sec. 5.6 that 


L 
Oi) => > Р W(y; D;x) f y ;0)s( D;x) 


Suppose we are given a set of nonnegative numbers b(1), . . . , b), with 
b(1) + +++ + МЛ) = 1, and we want to find a decision rule s that is 4 
Bayes decision rule relative to b(1), . . . ,b(h) Then s must be chosen to 


h 
жаке 2 0) "(0;5) as small as possible. But 
h L h 
PD =) 2 оъ) > Уно) 
x = 0-1 y 


h 
Let us denote the expression È $ bO) W(y;D;x)f(x,y;0) by К(р;у9. 
0=1 y Б 


1, Р 
Then s must be chosen to make Y У s(D;x) K(D;x) as small as possible. 
к E zD- 
This will be done if for each x,the quantities s(1;x) s(L 3x) are set so 
n e. ;Х),...,5(; 
as to minimize 2 s(D;x) K(D;x). Since s(1 x) s(L;x) are non- 
e 5х), ..., 


ns jae and add to unity, it is clear that the minimum will be achieved if 
and only if s(D;x) is set equal to zero for every D for which K(D3x) 5 
greater than the smallest of the quantities K(1;x),..., КО): as 
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шше K(D:x) 

‘TOM our dis S i 

decision rules mic we see that different decision rules may be Bayes 
Or some x, гот to the same /(1),..., b(h). This happens when 

are two differen Hn minimized for more than one value of D. If;andj 
cap au, ши negem with {йд}. Agag =i 

s(i;x) and s (fx ruction above says nothing about the relative sizes of 

5(/;Х), though of course | 


Other wo 
rds, fi i 
or a given x we should never choose a D that did not 


L 


s(D;x) > 0 and У 5(Ю;х) = 1 
psi 


are conditi 
San Te must always be satisfied. 
Я x are two ce y we take the second example of Sec. 5.7, in which 
i ie respective S pw given by proportions of defectives of 
wis Е. Todo di e shall find a Bayes decision rule relative to 
; SS Of x. Thee, уе compute K(1;x) and K(2;x) for all four possible 
€ computations are as follows: 


K(1:0.0) — 17 
9 = 4[ (50084) + (1,500040047 
+ 16£(500)(24)* + (1,500)84)0.0*] = 32,000/128 


K(2:0,0) — 
а ий оооу3д* + (1,0000001 
Калу к 6[(1,000)(14)° + (1,000)804] = 40,000/128 
D = K(1;1,0) = 14[(500 0:909 + 050004700] 


+ M[(5003 Q4? + (1,5002441 = 24,000/128 


K(2:0,1 
) = KQ:10) = [1.0000 + (1,0000001 
+ (1.000344) + (1,000)(94)%4)] = 26000128 


K(1:1,1 
) = ¥4[(500)(24)%(84) + (1,50047 
Кол, + [соор + (1,5000401 = 48,000/128 
= 16[(1,000)(14)2(84) + (1,000)(14)"] 
0/128 


die М м ДМ (1,000)247] = 4000 

m "n, < K(2;0,0), a Bayes decision rule relative to ¥4, 74 surely 

(OPE ion 1 when p p wer Similarly, a Bayes decision rule 
2 і „= 1. Since 


КО 6 1 

lative to. ко? surely chooses decision 2 when 41 ^ ^? . . 

NaS to 14 12 and К(1;0,1) = K(2;0,1), а Bayes decision rule 5 

A › ^2 can assign any value between O and 1 to s(151,0) and 
sion rule relative 


To 12 
lg т, N sum : i 
i mary, a decision rule s is a Bayes deci 


deci = 28 if 
Sion i only if s(1:0,0) = 1 and s(1;1,1) — 0. For such à 


155) = 55,000/64 — (3,000)/68)[s(150.1) + 01:10 


A) = 65,000/64 + (3,000/64)[s(1:0, D + s(1 1,0)] 
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As 5(1;0,1) and s(1;1,0) vary between 0 and 1, the points [7(1;5), r(2;s)] 
give the line segment joining (49,000/64, 71,000/64) and (55,000/64, 
65,000/64). (See Fig. 5.5 in Sec. 5.7.) . 

As another illustrative example, we find a Bayes decision rule relative 
to 25, 34 for the preceding problem. We compute K(1;x) and К(2;х) as 
follows: 


K(1:0,0) = 24[(500)34)* + (1,500)(14)34] 
+ 34[(500)(14) + (1,500)(33)04*] = 13.800/64 


K(2;0,0) = 2[(1,000)(34)* + (1,000)(14)94] 
+ 34[(1,000)(14)* + (1,000)(34)(14)?] = 16,800/64 
К(1;0,1) = K(1;1,0) = 12,600/64 
K(2;0,1) = K(2;1,0) = 12,000/64 
К(1;1,1) = 28,200/64 
К(2;1,1) = 23,200/64 


There is only one Bayes decision rule relative to 25, 35: it is the decision 
rule s with s(1;0,0) = 1, s(1;0,1) = s(1;1,0) = s(1;1,1) = 0. For this 
decision rule s, r(1;s) = 55,000/64, r(2;s) = 65,000/64. - 

Next we consider the case where each possible distribution has а joint 
pdf for Xy, ..., Xm Yi, ..., Yao f(x,y;0) denotes the joint pdf for the 
Oth distribution in our list. Using the formula given in Sec. 5:6, 


>) І, 
r(0:s) -Í a ‘| УХ, WOS DSVD) dx, ii dx, d1 dyn 
—9® 06 ‚= 
апа 


h x 3 L 
p b(0)r(0;s) -Í «ей У s(D:x) 
=A Ls 


—ap=1 


" [soo] У [ (у D3x)f(x.y30) dy, ay, fds dxn 
Denoting | : í 
юш] d | W(yiDix)f(x,y:0)dy,- dYa 
by K(D;x), we have 


= 


L 
У s(D;x)K(D;x) аху ** Ях 


-% D=1 


h > 
У b(0)r(0:s) -[ T 
0-1 = 


In order for s to be a Bayes decision rule relative to b(1), b) - - ^ b(l 
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s Е У хра) 


we of 
must choose the values s(D;x) to minimize [ 
E% © D= 


хК(Д;х) dx, +++ ах. This will be done if for cach x, the quantities 
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rape L 
$(1;х),...‚ s(Lix) are set so as to minimize У s(D;x) K(D;x) By the 


Same reasoning already used, the minimum will be achieved if s(D;x) i 
Set equal to zero fo Df vhich K( D;x)i e 
duc г every D for which K(D;x) is greater than the smallest 
i quantities KL 8). ss RUE): 
a аа example, we simplify Example 2 of Sec. 5.6 by assuming 
ie fe! take only the values 20, 40, and 60 and no others and that the 
0.03 005 he unknown parameter Is known to be one of the values 0.01, 
бет "im To bring the notation into line with the notation we have 
felabel ¢ B. where D ranged from 1 to Land 0 from 1 (ол, we should really 
he da i a decisions and distributions. However, we shall not bother 
буе, re and no confusion will result. Suppose we want to find a 
woe lecision rule s relative to L í, 14, 14; that is, we want to find 
an sion rule s that will minimize (14)r(0.01;s) + C 9)r(0.03 38) + 
1)” (0.05;s). We find 


—0.03(х+ ә + ag +D) 


K(D;x) = [G4X0.0139 70 +++ з+ + (14)(0.03)% 
+ (34)(0.05)e 70-503: 23+ D] (—10,000 — 100D) 
Cle; 

ROPA K( D;x) depends on xi, Ху, x, only through the sum ху + Xe = Ry 
кар" any specific value for Xy + Xs + Ха, We can compute K(20;x), 
equ; Erb Riedy) and take the appropriate action. Thus, ifx, + Ха + Xs 
s 80, К(20;х) = —0.0117, K(40:x) = —0.0073,  K(60;x) = 
t be abe and s must have s(205x) = 1, so that iX, + Xe + X, = 80, 

> decision D = 20 is certainly chosen. 


БА as 
Nes The Construction of Bayes Decision 
Pos ite Number of Possible Distributions ini 

Ssible Decisions. Our definition of admissible decision rule covers 


oo of an infinite number of possible distributions, but our definition 

imo rule covers only the case of a finite number of possible 

rul ibutions, Our first task is to extend the definition of Bayes decision 
© to the case of an infinite number of possible distributions. 

In the case of a finite number of possible distributions, a Bayes decision 


a decision rule that makes Y. b(0)r(0;s) 
0-1 
a chance variable with the 


Rules When There Is an 
and a Finite Number of 


Tule ç | 
le s relative to b(1),... , b(A) is 


ass А : 
4 Mall as possible. We note that if 0 were 

Stribution 
Possible values 1 E o Йй 


bay BQ a bth) 


Probabilities 
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h 
then > b(0)r(0;s) would be E{r(0;s)} for each decision rule s. Of course, 
6-1 " 
we do not consider 0 to be a chance variable, but the interpretation of 
h 
> b(0)r(0;s) as the expected value of a function of a pretended chance 
0—1 


variable 0 allows us to extend the definition of a Bayes decision rule to the 
case of an infinite number of possible distributions. 

Suppose B(0) is a given cdf for a chance variable 0, which assigns all the 
probability to the possible values of 0 in our decision problem. For any 
decision rule s, we denote by R(s;B(0)) the expected value r(055) would 
have if 0 were a chance variable with cdf B(0). А decision rule s is called 
a “Bayes decision rule relative to B(0)" if R(s;B(0)) < R(t;B(0)) for each 


and every decision rule /. This definition covers the case of a finite — 


number of possible distributions, as well as the case of an infinite number 
of possible distributions, and coincides with the definition previously 
given for the case of a finite number of possible distributions. A cdf used 
for the purpose of defining a Bayes decision rule, as B(0) was used, 15 
called an “а priori distribution." Again we emphasize the fact that 015 
not a chance variable, but an unknown constant, The introduction 0 
the cdf B(0) is just a technical device to enable us to extend the definition 
of a Bayes decision rule to the case of an infinite number of possible 
distributions. 

If B(0) can be differentiated to give a pdf b(0), then R(G;B(0) = 
f r(0;s)b(0) do. Let us assume that we are dealing with a problem where 
r(0;s) is a continuous function of 0, for each decision rules. Then if the 
pdf (0) is positive for all possible values of 0, and if s is a Bayes decision 
rule relative to B(0), s must be admissible. To prove this, suppose $ were 
not admissible. Then there would bea decision rule t, with (031). < (05) 
for all 0, with strict inequality for at least one value of 0, say, for 0. Thus 
(би) < r(Ü;s). But r(0;s) and r(0;) are continuous functions of 0, ап 
therefore there must be values А, B, with A < Band contained between 


M and B, such that r(0;r) < r(0;s) for all 0 in the interval from A t° B. 
ince 


n 
R(t;B(0)) — R(s;B(0)) — | [r(O;t) — r(0;5)]b(0) 40 < 0 
Ja 
we get R(t;B(0)) < R(s;B(0)), which is a contradiction, since s İS à Baye 
decision rule relative to B(0). This contradiction proves that $ pi 
admissible. [The nonexistence of b(0) at a finite number of points с 
not affect the argument.] r 
The preceding paragraph shows that in the case of an infinite ДШИ 
of possible distributions, the class of all Bayes decision rules contains 
many admissible decision rules. But are all the admissible decision Г e 
contained among the Bayes decision rules, as in the case of a finite num 
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of i re Se a зы 
oo distributions? The answer is, in general, in the negative: 
this es missible decision rules are not Bayes decision rules. Toexamine 
dition ruin бие, SUPPOSE 5, Sa, . .- is an infinite sequence of Bayes 
‹ les. ES : der Rd 
orn 1 decision гше s is called the “limit of the sequence 
r(0;s) = lim r(0;s;) 
jo 
for ; > 
ош posite values of 0 іп the problem. Then we have the following 
Bach a or the case of an infinite number of possible distributions: 
(em B missible decision rule is either a Bayes decision rule, or else is a 
bones $ sequence of Bayes decision rules. We shall not prove this 
F : -— 
decies à given a priori distribution B(9), the construction of a Bayes 
цр E rule relative to B(0) is a straightforward matter. For example, 
Y. pose B(0) has a pdf (0), and for each 0, the joint distribution of 
b. Xm Yy,..., Y, allows the possible values to belisted. Then 


L 
R(s;B(0)) -[[x У b: торхо] b(0) d0 


L 
= X X s(D;x)K(D;x) 


æ D=1 
Ste кру) = | X osse d 
у 


relative to B(0) if for each x, s(D;x) is 
for which K(D;x) is greater than the 
If for each 0, the joint 
/ (x,y 39), then we define 
b(0) 40 and use it the 


T Jes 
ae 5 is a Bayes decision rule 
Pe to zero for every D 
din lest of the quantities K(1;x), - - -> K(L;x). 
Creep: Of Xy... Ж» Y, Yahasia pdt, 
sx) аз f [f 7 р Wy; Р) 06:0 d ау] 
Same Way as above. , in ü 
Ba 5 а numerical example, we ta 
Yes decision rule relative to the c 


ke Example 1 of Sec. 5.6 and find a 
df B(0) defined as follows: 


B(0)—0 ford<0 
* B(0) =0 ford<0<1 
BO) =1 ford >1 
The Pdf 6(0) corresponding to В(0) is 
b(0) = 1 forü0 <0 <1 
b(0) = 0 for 0 < Oor 02 1 
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Computing K(D;x), we get 


1 
K(1:0.0) -Í [500(1 — 0)? + 1,5000(1 0y] d0 = 250 
0 


(1 
K(2:0,0) =| [1,000(1 — 0)? + 1,0000(1 — 0y] 40 = 1,000/3 
0 


1 
K(1;0,1) = K(1:1,0) -Í [5000(1 — 6)? + 1,5000*(1 — 0)] d0 = 2,000/12 
0 
1 
K(2:0,1) — K(2:1.0) -Í [1.0000(1 — 0)? + 1,0000% 1 — 0)] 40 = 2.000/12 
0 


1 
K(1;1,1) = [ [5000*(1 — 0) + 1,5000*] 40 = 5,000/12 
0 


1 
K(251,T) s [ [1.0000*(1 — 0) -- 1,0000?] 40 = 1,000/3 
“0 


Thus any decision rule s with s(1;0,0) = 1 and s(1;1,1) = 0 is a Bayes 
decision rule relative to the given B(0). Since it is easily verified that 
r(0:5) in this problem is a continuous function of 0 for each given s, ап | 
since our b(0) is positive for all possible 0 (with the inconsequenti® 
exception of the points 0 = 0,0 = 1), any decision rule s which is Baye 
relative to the given B(0) is admissible. In particular, the decision rule sı 
of Example | of Sec. 5.6 is admissible. И ће 

In the discussion above, we have been implicitly assuming that ae 
different possible joint distributions are given by the variation of a xS 
parameter 0. However, in many important problems, the possible Jer 
distributions are given by the variation of more than one parameter. Fas 
example, the possible distributions may be all possible normal distri m 
tions, and two separate parameters are necessary to specify a pen 
distribution. In such a case, we use as an a priori distribution à D 
cdf B(0,,0,) for the parameters 0,, 95. All the definitions and comp" 
tions remain essentially the same. 

5.11. The Construction of Bayes Decision Rules When There Is i. 
Infinite Number of Possible Decisions. When there is an pe 
number of possible decisions, we compute K(D;x) exactly as ers 
above. Then foreach x, wesets(D;x) = Ounless K( D;x) = min к(а 


е nd 

As a numerical example, we take Example 2 of Sec. 5.6, and we пт, 

а Bayes decision rule s relative to the a priori distribution 8(0) = 1 x JE 
ür 0 . 0. This a priori cdf has pdf 5(0) = e "for 0 > 0. The! 

ave 


s D m 
K(D;x) =f [f Ody =| (10,000 + 100 D)0' 
“0 р 


“0 
* exp [—0(x, + xo + xg + y)] ађе" 40 

-4 
—6(10,000 + 100D)(x + x) + xg + D + D 
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dq Хә + Ху) is low that K(D:x) is minimized 
+ У ize 
oe 45 taken into de EC i d negati a а 
if thi e give i en we get as pte 
г his les жы рос D ыша to ا‎ Bayes decision rule 
Q г sitive; E à + dad 
Missible, since mos (Шам choose res 0. Er di v er 
угри = 0. his decision rul 
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Now we can show that each point on the boundary of € — 
represents a decision rule not using randomization, or else lies on A 
segment joining two points of C representing decision rules not c 
randomization. Suppose this were not so. Denote by C the кае 
we get by plotting all the points corresponding to decision rules = pes 
randomization, and then drawing line segments between pairs О ma 
points in such a way that each of the points is either on one of the ni 
segments or else contained inside the boundary formed by the line d 
ments, and the line segments enclose a convex set. Then we wei P 
5.7, where p is a point on the boundary of C. But then there is an A, R 

such that the point p is on one О 
the lines Ar(13s) + Br(2;s) = (А,В), 
Ar(13s) + Br(25s) = gs 4, B), while no 
point of C' is on the line. Howevet, 
this is a contradiction, since the lin 
*P must contain a point corresponding 
to a decision rule not using random 
ization; that is, it must contain : 
point of C'. This proves tbat ie 
point on the boundary of C prove 
represents a decision rule not ux 
randomization, or else lies on a - i 
segment joining two points of C xd 
resenting decision rules not using! 
Fig. 5.7 domization. е 
In general, it is true that the ыа 
dary of C (and therefore C itself) is completely determined by the decis 
rules not using randomization. РР? 

For the third example of Sec. 5.7, we sketched the convex set C W! t the 
showing the computations. Now we are ina position to go throug die 
computations. We fix A, B and find the decision rules which lie ОП 
line Ar(1;s) + Br(2;s) = g(4,B). We have 


К(40;х) = -aÍ (14,000)(0.02) exp [—0.02(x, + xy + x3 + ))]4У 
40 


== af “(14,000)(0.04)! exp [—0.04(x, хь + xg + 0014 
40 


K(60;x) = -af ‘(16,000)(0.02)* exp [—0.02(x; + x, + xa +I D 
60 


— B | (16,000Y0.04) exp [—0.04(x, + xa + xa + 0] 4 
60 
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After the integrations are performed, it turns out that if A and Bare both 
positive, then K(40;x) < K(60;x) for all possible values of x;, xs, Хз, so 
that if 34, В are positive, the only decision rule on the line Ar(1;s) + 
Br(2;s) = g,(A,B) is the decision rule for which 5(40;х) = 1 for all x. 
For this decision rule, r(1;s) is approximately —6,290, r(2;s) is approxi- 
mately —2,830. 


Denote 
8e ?* — er 


1 — (8)е7°“ 
byw. IfAis positive and Bis negative, К(40;х) < K(60;x) if and only if 
X + Xa + xg >w. Thus if A is positive and В is negative, the only 
decision rule on the line Ar(13s) + Br(2;s) = (А,В) is the decision rule 
for which s(40;x) = 1 if x, + Xs + x4 > w, and s(40;x) = 0 if x, + 
Xa + ху < w. For this decision rule, 

r(1;) = —4,810 — 1,480e-99?"[1 + 0.02w + (1$)(0.02w)?] 

r(23s) = —1,450 — 1,380e-^[1 + 0.04w + (74)(0.041)"] 
If A is negative and is positive, К(40;х) < К(60;х) if and only if x, + 
Xi + xg < w. In this case, the only decision rule on the line Ar(1;s) + 
Br(2;s) = g,(A,B) is the decision rule for which s(40;x) = | if xy + х 
+ xg < w, and 5(40;х) =0 if x, + Xa + xg >w. For this decision 
Tule, r(1 ;s) = — 6,290 + 1,480е—9-0%"[1 + 0.02w + (74)(0.02»)"], r(2;s) = 
—2,830 + 1,380е—®“[] + 0.04w + (24)(0.041)*]. If A and Bare both 
Negative, there is only one decision rule on the line Ar(1;s) + Br(23s) = 
&(A,B), and it has r(1:s) = —4,810, r(2;s) = —1,450. Thus we have 
found all the boundary points of C on the line Ar(1;s) + Br(23s) = 
51(4,В). The boundary points of C on the line Ar(1;s) + Br(23s) = 
8:(4,B) are already included in the ones we have found, because 
&(—A4,— B) = —g,(A,B). 

Note that, in the preceding example, every point on the boundary of C 
Corresponded to a decision rule not using randomization. This illus- 
trates the following theorem: Jn any decision problem where there is a 
Jinite number of possible joint distributions and each of the distributions has 
а pdf, then corresponding to any point on the boundary of the convex set C 
there is a decision rule not using randomization. | 
‚ We shall not prove the theorem just stated, but we point out an 
Important consequence. If s is any admissible decision rule, s corre- 
Sponds to a point Q on the boundary of C. But by the theorem just 
Stated, there is a decision rule ¢ not using randomization which corre- 
Sponds to the point О. This means that r(0;s) = r(0;1) forall 0, which in 

Urn means that s and г are equivalent from our point of view. Thus we 
Need never use a decision rule which uses randomization, since there is 
always a decision rule not using randomization which is as good. (We 


50 log —* + 50 log | 
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emphasize that this last statement applies to a problem with a finite 
number of possible distributions, each having a pdf.) 


5.13. Sufficiency. Suppose that we have a decision problem in 
which there аге u functions of x, . . . , Xm, denoted by Zy, . . . , Zu with the 
following two properties: 

1. There are functions A(xy, ..., X4), 2(2,...› Zo Yn sos 0), 
never negative, such that for all 4, fs + gn, Varo з Vos = 
AQ... XS Zo Yn + + + Vas 0), identically in the x's and y's. 

2. There is a function Wor, „эзли DI. ез» +0) such that 
Whar э.» Da DS зуу». „д = Moss E DEE «eg Be IER 
cally inthe arguments. Then we shall show that any admissible decision 
rule can be based on a knowledge of the values of z}, ... , z, alone, with 
no necessity for knowing the individual values of Xj, . . . , Xm 

Before showing this, we give two examples. First we note that 
property 2 is automatically satisfied in any problem in which the loss 
function depends only on Yes.» Р, апа not оп xy oe a y Хы Turn- 
ing to Example 1 of Sec. 5.6, we note that f(x,,x,v;0) can be written as 
non qo yit, Setting = — x, ob Xo, we have fos 30) = 
0*(| — 0)3-е+ю, Thus property | is satisfied with = 1,2) = Ху + m 
A(xy,X2) = 1. Property 2 is automatically satisfied, since the loss 
function W does not depend on ху, x. 

As another example, we turn to Example 2 of Sec. 5.6. T 
/(х„х»,х»у;0) = 04 exp [—0(ху + xs + xy + y)] if xy х, a J we on 
positive. Setting z = x, + x, + ху, we see that property | is satisfie 
with и = 1,2, = x, + X, + Xs A(xyxsx,) = 1. Property 2 1$ also 
satisfied, since the loss is a function only of y and D. be 

Now we turn to the proof that any admissible decision rule сап Ж 
based purely on a knowledge of z,...,z,. Suppose we want j 
construct a Bayes decision rule relative to B(0), where В(0) has pdf 600). 
Then K(D:x) = f b(0)(f = Ws Рз), у;0) dy, 7: dy, dO = A077 
X) A f e DO sous. DE Zh EBD sio Ft 5 
Jn; 0) dyi: >+ dy,] d0. We denote f 600) - - (р, -> Vrs D; 2 

+1 2y) Bly <<» Zur Jus Yai Odyss dy] ав by K(D;2). T™ 
K(Dix) = A(x,,..., x,)K(D;z), and this means that 

K(D;x) = min К(А;х) 
if and only if e 

K(D:z) = min K(A:z) 
: miza- 
x,, since the D that minimizes 


soto 
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K(D;x) also minimizes K(D;z). But we have seen in the preceding 
section that the decision rules not using randomization determine the 
whole convex set C that is available to us. This implies that a person 
who knows only the values z,,...,-, can construct a decision rule 
identical (as far as expected losses are concerned) with any decision rule 
that can be constructed by a person who knows all the values xj, . . . , X,,. 
For this reason, it is said that “z,,..., z, are sufficient for the decision 
problem." 

If и is much smaller than т, as is often the case, it is convenient to 
restate the whole problem intermsofz,,...,z, This is done as follows. 
Let Zi ..., Z, be the same functions of Xj, ..., X,,asz,,..., z, are of 
Xy,...,X,,. Then the original decision problem can be restated as a 
decision problem in which we observe Z,..., Zu and then make a 
decision. Thus, in our firstexampleabove, Z = X, + Xs has a binomial 
distribution with parameters 2, 0, so the possible joint distributions of 
Z, Ү are given by 

Possible values of Z 
0 1 2 


ad - 03 2001 — 0)2 0X1 — 0) 


Possible values of Y 0)1 0)2 20X1 — 0) 03 


as 0 varies from O to 1. The loss function W(y;D) remains the same. 
In our second example above, Z = X, + X, + X, and by a calcula- 
tion similar to that used in the last example of Sec. 3.11, we find that the 
possible joint pdf's of Z, Y are (12)04z*e “>” for y, z both positive, 0 
varying between O and œ. The loss function W( y; D) remains the same. 
As a third example, suppose a company has to decide how much of a 
perishable commodity should be stocked to meet the demand of the 
coming sales period. Since this demand is the sum of many independent 
individual demands, it is assumed that the demand has a normal distribu- 
tion (Sec. 4.7) with unknown mean 0, and unknown standard deviation 
9. The net profit on each unit of the commodity sold is p, dollars; the 
net loss on each unit of the commodity unsold at the end of the period is 
Pa dollars. Before deciding, the company will observe the demands for 
the commodity in т regions similar to the region it serves, the demands in 
these m regions being assumed independent, each with the same distribu- 
tion as the demand for the coming sales period. Assuming that the cost 
Of observing the m regions is negligible, the decision problem has the 
following structure. D is the amount of the commodity that will be 
Stocked and can be any positive number. Y is the demand that will be 
Observed in the coming period. %j,..-; X „ are the demands that will 
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be observed in the т regions. The loss depends only on Yand D, and is 
given as follows: 

W(¥;D) = —p,D ifY> р 

W(Y;D) = —pyY¥+p(D—Y) if Y<D 
The possible joint pdf's are given by 


— * WENT А 

(0/29) "7 exp | x; à OF 30 nP] 
as 0, 0, vary, with 0, always positive. Define т, as (I/m)(x, + 7^ + xy) 
and z, as (1/m) X6. — zy. We have 
i-l 


У(х = 0) о: z)-4 (2, — 00] 

=> (x; — zy + m(z, — 0 + 23 (x — zz, = 01) 
But ї=1 i=1 

2 = 2)21 — 01) = (a Уч 21) = (21 (5 e 
so that 


mz) =0 


У (x; — 6? = mz, + m(z, — 0,)° 
i=1 


Therefore the joint pdf can be written as 


Б 2‏ 1 توچ 
m-l ехр ТЕ [nza + m(z, — 01] x (y = 0,) |‏ )6,,/27( 
This shows that z,, 2, are sufficient for this decision problem. 21; 2з are‏ 
known, respectively, as the “sample mean" and “sample variance" of the‏ 
“sample” consisting of the numbers x,,..., Xm. Since much of con-‏ 
ventional statistical theory assumes normal distributions, our dis-‏ 
cussion illustrates why the sample mean and the sample variance have‏ 
such an important role in textbooks on statistical theory.‏ 


5.14. Selecting One Particular Decision Rule. Most of our discussion 
so far has been devoted to methods for finding all the admissible decisio" 
rules. But since in most problems there are infinitely many admissible 
decision rules, what further principles can be used to select one particula 
decision rule from among all the admissible decision rules? 

As a matter of fact, we could claim that it is not the statisticia 4 
select one particular decision rule, but to find all the admissible decision 
rules. Then the person who will actually incur the loss should select опе 
particular decision rule from among the admissible decision TU e 
presented to him by the statistician. Some general principles s 
selecting one particular decision rule have been suggested (but none = 
been universally adopted), and we shall describe two of these principle 


п'е job (0 
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One principle uses so-called “subjective probabilities.” Subjective 
probabilities are probabilities assigned to the possible values of 0 and are 
supposed to represent the degree of belief that a given 0 represents the true 
joint distribution. Thus, if there were six possible joint distributions, so 
that 0 ran from 1 to 6, and if it were felt that the first distribution was 
twice as “likely” to be the true distribution as any of the other distribu- 
tions, the subjective probability assigned to 6 = 1 would be 24 and the 
subjective probability assigned to each of the other five possible values of 
0 would be 4. The principle states: Choose a decision rule which is 
Bayes relative to b(1),..., (Л), where b(0) is equal to the subjective 
probability assigned to the value 0. The rationale for this principle is 
that if the b(0) were real probabilities instead of subjective probabilities, 


h 
we should want to choose a decision rule s that minimizes У 5(0)r(0;s), 
0-1 


since this sum would represent the expected loss if the true 0 were chosen 
by a random device that assigns probabilities b(1),..., b(/) to the 
possible values of 0. А difficulty in applying this principle is that there 
аге no objective methods given for assigning the subjective probabilities. 

A different principle that has been suggested is the "minimax 
criterion." For any decision rule s, let M(s) denote max r(0;s). Thus, 


in Example 1 of Sec. 5.6, we introduced a decision rule s, and found that 
r(0;sı) is given by 500 + 1,5000 — 1,0000? forO < 0 < 1. By differentia- 
tion, we find that r(;s,) is a maximum when 0 = 24, and r(24;5) = 
(28)(2,125). Therefore M(s) = (%%)(2,125). Similarly, for the decision 
rule s, introduced in Example | of Sec. 5.6, we find M(s2) = 1,500. The 
minimax criterion states: Use a decision rule s which minimizes M(s). 
Such a rule is called a “minimax decision rule." For example, in 
Example 1 of Sec. 5.6 we saw that л(2 ;5) = 1,000 for any decision rule s. 
But then for any decision rule s, M(s) > 1,000. Lett denote the decision 
Tule with (2;x) = 1 for all x; that is, chooses decision 2 no matter what 
the observations are. Then r(0;r) = 1,000 for all 0, and M(t) = 1,000. 
Thus г is a minimax decision rule for Example 1 of Sec. 5.6. | 

The minimax criterion has been criticized as being too conservative. 
Thus suppose in a hypothetical problem with five possible joint distribu- 
tions, we are comparing two decision rules 5, and о, and the expected 
losses when using each are given by the following table: 


0 
T 2 3 4 5 
r(0:5,) 1 8 6 3 1 
r(0;s) |0 801 2 1 0 
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M(s) = 8, M(s2) = 8.01. The minimax criterion would tell us to use 5; 
rather than s». But unless we were practically certain that the true 0 is 
equal to 2, s, would seem to be a more reasonable decision ги!г to use, 
since (0352) < r(0;5,) for all 0 except 0 = 2, and even when 0 = 2, r(0552) 
is only slightly greater than r(0;5). However, this is an artificial and 
extreme example, and in practical applications such situations rarely 
occur. 

In the next chapter, we discuss a method for the direct computation of 
minimax decision rules. We end this chapter by describing a method 
(for later use) of recognizing a decision rule as minimax. First let us 
assume that we are dealing with a decision problem with a finite number 
h of possible joint distributions. We have the following theorem: Ifsisa 
Bayes decision rule relative to b(1), . . . , b(h), and if r(05s) = M(s) for each 
and every 0 for which b(0) > 0, then s is a minimax decision rule. 

Proof. Suppose s were not a minimax decision rule. Then there 
would be a decision rule t with M(t) < М(х). But then 


h h 
POA) < M(t) < M(s) = У b(0)r(0;s) 
5 в=1 


which would imply that s is not a Bayes decision rule relative to (1),  - ** 
b(h), a contradiction. This contradiction proves the theorem. 

To generalize the preceding theorem to the case of an infinite number 
of possible distributions, we first define the phrase “0 is a point of increase 
of the a priori distribution B(0)." This phrase means that either BO 
assigns a positive probability to the point б, or else В(0) has a derivative 
b(0) at бапа b(0) > 0. Then we have the following theorem: _//55@ 
Bayes decision rule relative to B(0), and (0з) = M(s) for every Û which is 
a point of increase of B(0), then s is a minimax decision rule. — 

Proof. Suppose s were not a minimax decision rule. Then there 
would be a decision rule г with M(t) < M(s). Using the notation 
R(s;B(0)) introduced in Sec. 5.10, we see that in computing Rs; B0)» the 
only points 6 that matter are the points of increase of B(0), since the other 
points are assigned zero probability by B(0). Therefore RG;B()) = 
M(s), since r(0;s) = M(s) if Û is a point of increase of B(0). Clearly, 
ДИ БО) < E and therefore R(t;B(0)) < R(s;B(0)). But this implies 

s is not a Ba isi : е ic 
proves the еру a, a Н 


o 


Chapter 6 


LINEAR PROGRAMMING AS A 
COMPUTATIONAL TOOL 


6.1. Introduction. We introduced the concept of a minimax decision 
rule in Chap. 5 and gave theorems which enable us to recognize a decision 
rule as minimax, under certain circumstances. However, we still have 
по direct way of constructing a minimax decision rule. In the present 
chapter, we shall describe a method for the direct construction of minimax 
decision rules, 

x We start by discussing a nonstatistical problem of a type known as a 
linear programming problem.” Suppose there are three fuel types, 
with weight, energy content, and cost given by the following table: 


Fuel type 


1 2 3 


Weight per unit volume 
Energy per unit volume 


Cost per unit volume 


These fuel types can be mixed in any proportions, and the weight and 
energy content of the mixture are the sum of the weights and energies of 
the component fuel types. It is desired to obtain a mixture of 10 units of 
volume whose total weight is no more than 160 units and total energy is 
at least 320 units, at minimum possible cost. To put this problem in 
Symbolic form, let x, denote the volume of fuel type 1 that will be used in 
the mixture, xs the volume of fuel type 2, and x; the volume of fuel type 3. 

hen we must find the values of ху, хә, xg which minimize x, + 0.8x, + 
14x, subject to the restrictions ху > 0, x» > 0, x; > 0, and 


ху + хә + xg = 10 
15x, + 10x, + 20x, < 160 
30x, + 25x, + 45x, > 320 
91 
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By introducing nonnegative “slack” variables x, and x;, we can turn the 
last two inequalities into equalities: 


15x, + 10x, + 20x; + x, = 160 
30x, + 25x, + 45x, — xy — 320 


In the general linear programming problem, we have m linear 
inequalities and/or equalities on our unknowns. Each inequality can be 
turned into an equality by introducing a "slack" variable into the 
inequality. Assuming this is done, we then have т linear equations 1n a 
certain number, say, л, of unknowns xy, ..., X,: 


ах + ах Б + aux, = by 
ахі + gX 4 F ах, = by 
AmX1 + аха +5 + aux, = b 


The problem is to find nonnegative values for x,,..., x, which satisfy 
the m equations and which minimize a given linear “objective function 
Cj F Cag +++ + CX, h 
We denote by С the set of all n-dimensional points (Xi, -> Xn ym 
that xy > 0, x» > 0,..., x, > 0, and the values Kip He, слоя RN satisfy 
the т given equations. We assume that G contains at least one pointan 
that there is a finite value Н such that every coordinate of every point 1n 
is smaller than H. Both of these assumptions will hold in all deu 
problems we shall encounter. Our linear programming problem pis 
find a point in G at which the objective function is minimized. 4 
following theorem will be useful. ; 
, Theorem. If (yy... у) and (2...,2,) are two different points 
in G with y, = 0 if z, — 0 and z = 0 if y; = 0 for all j, there is d Дд 
(Wi isa Wy) of G with more zero coordinates than Qui iJ : 
(21, ...,2,) and with CW, cob ew, < буу, dott + Cun 
Gi ro Pe e Cun E СЛИ 


Proof. Suppose that exactly r of the y's are positive, and for the sake 


and 


i e 
of definiteness, suppose that У... у, are positive, and therefor 
Уна = "= ул = 0. Then of course Z»... „ z, are all positive, 2 ) 
за —'7' =z, = 0), rmust be at least 1, since if r were О (yr ++; „ун 
апа (21, 


-> Zn) would be the same point. The n-dimensional point 


о s E 220 0 
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is in G for any A for which Ay; + (1 —A)z; > Ofor? =1,...., r,sinceit 
1s easily verified that the coordinates of this point satisfy our тт equalities. 
But Ay, s+ (1 — 2)z; > 0 means that 2 > —z,/(y; — z) if y, —z, >0 
and À < —z,/(y; — z) if y; — z; < 0. Let A denote max [—z,/(y; — 
z,] taken over all i < ғ for which у; — 2; > 0, and let B denote 
min [—z,/(y; — z,)] taken over all i < r for which y; — z; <0. This 
means that for апу 4 between А and B, the point (Ayı + (1 — 2)z, ..., 
Ay, + (1 — 222, 0,..., 0) is in С. Clearly, A < 0, and since if y, — 
Z; < 0, we have 


—2; 1 


> 1, therefore В > 1 


Jk — i КЕ Xil zi 


Also, А and B are finite numbers, because otherwise we could push å far 
enough out to give a coordinate above H, violating our assumption. 
When 2 is set equal to either А or B, at least one of the r quantities 
Ayı + (1 — 2)21, ..., Ay, + (1 — 2)z, is equal to zero. There are three 
possible cases: 

ау +--+ ey = сүл, 55 + сл. In this case, we take as 
Our point (wı, ...„, w,) the point (Ay, + (1 — 4)z, ..., Ay, + (1 — 
A)z,,0,..., 0), and then we find that the value of the objective function 
atn, ..., wis (y + + су) + (1 — Alaa 7 + сы) = 


“уу td ey, = сул + + 6z, and thus the point (из, ..., Wr) 
Satisfies the conclusion of our theorem. 
суу + °°° ey < CZ; + °°° ez. In this case, we take as 


Our point (и,...‚ w,) the point (By, + (1 — B)z, ..., By, + (1 — 
B)z, 0,..., 0), and then the value of the objective function at (Wı, ..., w,) 


is Gh + су, + (1 — Baz + +62, — су, = °°° — су), 
which is less than ау + +++ + cry, Thus the point (wy... , у) 
Satisfies the conclusion of our theorem. 

ау +++ ey > C2 + °°° + cze In this case, we take as 


Our point (w, ..., w,) the point (Ay, + (1 — А)л,..., Ay, + (1 — 
A)z,,0, ...„ 0), and then the value of the objective function at (W1, .. . , w,) 
18 бүл +++ ez + AG + *** + су — n — *** — 62), which 
1s less than сугу +++ ez. Thus the point (wy, ..., Wn) satisfies the 
Conclusion of our theorem. 

A point (qı, .. . , q,) in G will be said to have the “property U” if no 
Other point of G has zero coordinates in exactly the same locations as the 
Zero coordinates of (di, .... q,). The theorem just proved shows that 
there is at least one point with the property U at which the objective 

unction is minimized. For suppose that (J, - - - . Yn) isa point in G at 
Which the objective function is minimized. If(y;, . . . , y,) does not have 
t © property U, there is a different point (z, ..., z,) in С with zero co- 
Ordinates in exactly the same places as the zero coordinates of ( ys, . . . , y,). 
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Then the theorem tells us there is a point (wj, ..., w,) in G at which the 
objective function is minimized, where (w, ...,w,) has more с 
coordinates than (у,...,у„). If (wy... s Wa) does not have be 
property U, we repeat the process, always getting points with more 
more zero coordinates at which the objective function is minimize : 
This process must terminate, since there are only л coordinates, pim 
the termination we have a point of G with the property U at whic 
objective function is minimized. | | 

Next we show that only a finite number of points of G have the proper 7 
U. For if we specify the locations in which we want zero coordi at 
either exactly one point in G has zero coordinates in the specified pi 
or else no points or more than one point in G has zero coordinates int N 
specified locations. Only in the first case does our specification lead x 
point with the property U, and then it leads to exactly one such ein 
Since there are 2" different ways of specifying the locations in whic Ü 
want zero coordinates, there are at most 2" points with the property ^^ 
and in most problems, there are far fewer than 2". iie 

No point in G with fewer than n — m zero coordinates can ag 9—4 
property U. For suppose that a point (уо Yn) with fewer E 
n — m zero coordinates had the property U. Let r denote the num 


Же : : an т. 
of positive coordinates of (уь... р). Then r is greater than 
Without loss of generality, we can assume that jj Sag 
Ут cot у, = 0. Then 

апу + ашу» °°‘ + ayy, = by 

аһу + ays +5 + ару, = وط‎ 


ау E days 4 


| П 
каму, = bm 


" " А аї 
Since r > m, there must exist quantities (q,, a 


1.9) with q; # Ji 
least one i < r and with 


1 | 

ап T 495 + +++ + ард, = by 
аз + 45 + +++ + aq, = by 
idi + Amada + *** аа, = Dyn 


since otherwise m lineare 


an impossibility if >. у 


. | wns: 
quations would uniquely determiner unkno 
1. But then a nonzero value for 4 can be 


found 
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so that the point (Aq, + (1 — Ayr -- - » 49, + (1 — 1)ys 0, ....,0) 15m 
G, and 2q; + (1 — Ay; > Ofori = 1,...,7, which shows that the point 
Oi - - -a Yn) does not have the property U. 

Since only a finite number of points of G have the property U, and since 
the minimum value of the objective function is achieved at a point with 
the property U, we can in principle find all the points of G with the 
property U, compute the value of the objective function at each of these 
points, and use the point giving the smallest value of the objective 
function. As an example, we discuss the fuel-mixing example with 
which we introduced this chapter. In that problem, п = 5, т = 3, so we 
have only to examine points with at least two zero coordinates. We get 
the following list: x, = 0, хь = 0. Then x, = 10, x, = —40, point not 
in G. x,—0,x, —0. Then x, = 10, x, = 60, x; 70, point not 
її б. x, = 0, х; =0. Thenx, = 4, xg = 6, x5 = 50. This point has 
the property U. The objective function has the value 11.6. x, = 0, 
X; = 0. Then Xx, = 6.5, x4 = 3.5, x, = 25. This point has the 
Property U. The objective function has the value 10.1. x, = 0, хз = 0. 
Then x, = 10, x, = 10, x; 20, point not in С. x, = 0, x, = 0. 
Then x, = 8, x, = 2, x, = 10. This point has the property U. The 
Objective function has the value 10.8. х = 0, х; = 0. Then x, = %4, 
Xa = 4%, x, = 105. This point has the property U. The objective 
function has the value 216%. x4 = 0, x, = 0. Then x, = 12, x, = 
—2, point not in С. x, = 0, xg = 0. Then x, = 14, x, = —4, point 
not in С. x, = 0, x, = 0. Then x, = 10, хь = —1, point not in G. 
From this list, we see that as soon as we set any two coordinates equal to 
Zero, we get either a point not in G ora point with the property U. There- 
Ore no point with more than two coordinates equal to zero will have the 
Property U. There are four points of G with the property U, and we see 
that the value of the objective function is minimized when x, = 0, 
Хә = 6.5, x, = 3.5. This is the least expensive fuel mixture that meets 
Our volume, weight, and energy specifications. 

It is easily seen that handling more complicated problems by the 
Complete enumeration of the points with the property U, as in the 
Preceding example, would be prohibitively long. In the next section we 
describe a more efficient method for findinga point at which the objective 
function is minimized. 


6.2. The Simplex Method for Solving Linear Programming Problems. 
In all the problems we shall discuss, п will be greater than т, and from 
now on we assume that this isso. Then any point in G with the property 
U has at least л — т zero coordinates. In some problems, every point 
1n G with the property U has exactly п — m zero coordinates: such 
Problems are especially simple to solve and are called “nondegenerate” 
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problems. Until further notice, we assume that we are dealing with a 
nondegenerate problem, so that every point in G with the property U has 
exactly m positive coordinates and n — т zero coordinates. . ~ 

Suppose that (21, . . . , g,,) isa point in G with the property U. р With E 
loss of generality, we can assume that g,...,g, are positive an 
£wuccocqg,-—0. Then the m equations in the m unknowns 
Xp Xm 


ч sagal d im 
auxi + + ах = by 
аху Htt + дых = ba 
ахі РЕ ° °° FORRES Om 


have a unique solution: x, = gı, ..., Xm = Zm, for if there is more pa 
one solution, it is not difficult to show that the point (g,,..-, g,) do 


not have the property U. The fact that there is a unique solution 
implies that the determinant 


ац ttt йы 
ар ** Aom 
аы с а 


тт 


; u — 
isnotequaltozero. This in turn means that we can solve the equation 


ап btt архы = by — Gy ag — 111 ian 
anky des d “ыхы, = by — аз, ма1Хыжр — 777 — dann 
ау be" aux. = Dy — Ginm+1Xmty — °°° — min 
nl . — Xn’ 
for the quantities x, . . . , Xm In terms of the quantities хан» ‘° ' ities 
Each ofthe quantities xy, ... , Xm will bea linear function of the quan 
Хаа» ee y Xy say, 
э == ыхы ded lisi 
= 2 must 
ford cana yr: (Note that when x,,, =-- x, = 0,0: П 


equal g,, to be cons 


. == ^d . 

; А En n К Уе 
; istent with our original assumptions.) The objec 
function ах + -- 


.„Хт 
` + сьх, сап now be expressed in terms of Xr 7? 
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SAY, AS go 4-4, 1х1 + °*** + d,x,. It is convenient to display all 
these relationships in the form of a "tableau," as follows: 
5 я 


Coefficient of: 


Constant хь و‎ oe ^ 


Xi hime Aime 

Xa S2 lug Bens 557 dh 

Xn Sm Һылыу una °°° Ann 
Objective 


It should be emphasized that we chose to express everything in terms of 
Tuas... , x, purely for convenience in writing: in any specific example, 
different sets of x's will be displayed along the sides and top of the tableau. 
Аз а numerical example, the tableau corresponding to the point 
(0,4,6,0,50) of the fuel-mixing example of Sec. 6.1 is easily found to be 


xs Xa 


Returning now to our general tableau corresponding to the point 
didi +++, Em, 0,..., 0), we see that if the values d,,.,, . . . , d, are all 
negative, the objective function cannot be decreased by raising any of 
s quantities Nude s a XS ABOVE ZOTO: Thus, if the values озго А 
ге all nonnegative, the objective function is minimized at the point 
81.82,...,¢,,, 0,...,0). However, if one or more of the values 
nays... d, are negative, the objective function can be decreased by 
raising any of the quantities x,,.;,..., X, corresponding to a negative d. 
1€ simplex method raises one of the quantities Хы... , X, COTTE- 
SPonding to a negative d as much as possible. If precisely one of the 
quantities d,,,,,..., d, is negative, there is no choice about which x is to 
2: raised above zero. If more than one of the quantities d11, . . . , d, is 
B ive, there is a choice as to which x is to be raised above zero. Asa 
bu of thumb, we raise the x corresponding to the negative d which is 
Best in absolute value. In the numerical example of the preceding 
Paragraph, ху would be raised above zero. 


98 STATISTICAL DECISION THEORY 


In our general tableau, suppose d, is the largest negative din absolute 
value, so that we raise x, above zero. х, is to be raised as much as 
possible: suppose x, can be raised to A and no further. By the asip, 
tion made above, А < H < œ. When x, is raised to A, while the put 
the quantities x,,.,, . . ., x, are held at zero, х, becomes g; + E 
і= 1,...,т. We must have g; + ЛА > О for i = 1,..., mM. 
is a restriction on A only if h,;, < 0, in which case A ~ ( —gilhJ. Fro! . 
this, it follows that A = min (~g,/h,,), where the minimum 15 taken ove 
all values of i between 1 and mfor whichh,,isnegative. In our (ишы 
example A = тїп (—4/—0.5, —6/—0.5, —50/—5) = 8. When es 
raised to A, at least one of the quantities x}, . . . , X, becomes zero, an ei 
our assumption of nondegeneracy, exactly one of the quantities Xs. -++ „Хь 
becomes zero. Suppose х, becomes zero, so that А = ~g,/hys- 
numerical example, x, becomes zero.) Now we construct a new tab ish 
with x, shifted to the top and x, shifted to the left column. Weslisingn 
the entries in this new tableau by primes: g’, Лу, dj. In order to deve F 
formulas for g;, hj, d; in terms of g,, /i,;, d,, we first express x, in MU 
Xn Xmas Xm49 e -a Xah: Хуу» +++ XQ aS follows. We know from 
starting tableau that 


Jeau, 


х, = Ee Meme niar cot fx, H H hiia 


Since h,, < 0, we can solve this equation for x,, getting 


х, = 8% 4 1 x haman 2 к=н ШЙ 


r Хи+1 U 
hy hy. kk Ж 


Xn 


Thus we have the entries in the new tableau in the row for X,: 


Bs E 2. а = йы for j # r 
hs Ны Ды 
Now we express x; (1 < i < m; i £ r) in terms Of X,, 1 Ye? ‘a 


Xp Xei +++, X, by means of the following calculation. FF 
tableau, 
x = ge he uaa shove 4 Жый e +++ ab Йыш» 


Substituting the equation for x, in terms of x,, 


es to 
that we developed above, we find 


I j 
x; = g; — Erle y hs xd (n een) T pee 
sr. әш» | Xa 
hy, h,, m h m 


rs 
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This gives the entries іп the new tableau in the row for x;: 


[| А, 


9 


gli ; dil Vis NM 


s ha 


lys rs rs 


for j 5 ғ 


8: == 8; 


Finally, we express the objective function in terms of X,, Х,а. Хы, s+ +s 
Yi Noa... x, by means of the followingcalculation. The first tableau 
gives that the objective function equals go + „Хы coo + dx, + 

dus om Substituting the equation for x,, we find that the objective 
function equals 


l; : i е lh 
FEET (а бшем]... Ж (а, Шы, 
һ һ 


Mrs hs rs rs 


This gives the entries in the new tableau in the row for the objective 
function: 


dlr; 


hs ls his 


4,8, dy pad 


] 
1d, = —, dj j 


forj 52 г 


, 
50 = go 


In our numerical example, x, is x» and x, is ху, so the second tableau is 


х 
" (—0.5)(0.1) 
Ny ے‎ 
з 05 0.2 
B (50. 
ы p 70. 2—05 
Objective (0.4) _ || 0: : 0.06 (ODD оов 


—0.5 ~0 0.5 


li The new tableau corresponds to the following point inG:x, =0 ШҮ is 
'Sted along the top of the tableau; х, = g; if x, is listed along the side of 
the tableau. This point has the property U, since the x's listed along the 
top Uniquely determine the x's along the side. Also, this point is an 
Improvement over the point corresponding to the original tableau, since 
the Objective function has been lowered in moving to the new tableau. 

no d'in the new tableau is negative, the objective function is minimized 
ws € point in С corresponding to the new tableau. But if at least one d 
Уз € new tableau is negative, we can decrease the objective function by 
oe to a third tableau, using exactly the same method we employed 
Move from the first to the second tableau. We keep doing this until we 
each a tableau which has no negative d's and thus represents a point at 
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which the objective function is minimized. Such a tableau И т 
reached їп a finite number of steps, since each tableau corresponds | a 
point with the property U and there is a finite number of such’ p 
Thus, in the second tableau of our numerical example, d; is пера 

we can lower our objective function by increasing x,. As we Lue 
both x, and x; decrease, x; reaching zero first. Therefore, in thg The 
tableau, x; will appear on top and x, will appear along the side. 

next tableau is 


100.2) 26 
pc, 


26 _„ _ 00.2) А 
3 


Е —3 


(10(-02) 1 
оз ^3 


Xa : "3 1 —3 
—10 10 


~3 3 


(—0.08)(10) 316 


(—0.08)(10) 
—3 30 ч 


3 


Objective 


0.2 - 


ONES Я increasin 
We see that the objective function can be decreased further by increa 5 


: as our 
хз above zero. As x, increases, X, decreases. Therefore we get as 
next tableau 


he 
This tableau gives the final solution, since it is not possible to decrease tn 
objective function by increasing either ху ог x, Thus at the Phat 
(0,6.5,3.5.25.0) the objective function is minimized. This confirms , 
we found above, in the enumeration of all points with the property lex 
We have finished our description of the main features of the SIT х i 
method, which moves from tableau to tableau by a systematic солу 
tional procedure. Each tableau represents a point in G with the prof eau 
U. Опе question that we have not yet discussed is how the first (29 1 a 


is found. In our simple 
starting tableau, but in mo 
As we shall see, whenever t 
decision problem, it will 


numerical example it was easy ! 
ге elaborate problems it can be qu! 
he simplex technique is applied to а $ 
be a simple matter to find a starting 


to 
te a task 
tatistic? 
tableau: 
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For the sake of completeness, however, we sketch a method for finding a 
Starting tableau that will work for any linear programming problem. 
The stasting tableau is found by first solving the following linear pro- 
gramming problem: Find the nonnegative quantities x,,...,X,, 
Ynti ++ Ху Which minimize X,, + *** + х, „апа which satisfy the 
т equations: 


dy, + Ay Xp E Xa+1 bı 
da F °°° H Хь E Xpo b, 
алх * Oana Ж аһа = by 


Where in the ith equation --,. ; is used ifb; > 0, —x,., is used ifb, < 0. 
b, 0), Xa+: 15 not introduced into the problem.) For this linear 
Programming problem, a starting tableau can be written immediately by 
expressing лад: + Ждан in terms of ху, ..., x,. This tableau corres 
SPonds to the point x, => = x, = 0, Xna = lbih <<< Xs = 16,1. 
he final tableau for the linear programming problem will obviously 
represent a point where x,,; = ' ** = x,,,, = 9, since the problem was 
to minimize Ху Б хь: But then the coordinates x,,..., Xn of 
this point satisfy the m equalities of the original linear programming 
Problem, and only m of the quantities x,,..., x, are positive. Thus the 
quantities X»... , х, represent a point with the property U for our 
original linear programming problem. The tableau for the original 
Problem corresponding to this point x, . . . , х, can then be constructed. 
, All the computations described so far are based on the supposition 
at our linear programming problem is nondegenerate; that is, every 
point їп G with the property U has exactly т positive coordinates and 
has Zero coordinates. What happens if this is not so? We may 
even û tableau in which one or more of the g,’s are zero (i # 0). Then, 
ven though one of the ds is negative, it may be impossible to increase the 
corresponding x above zero because an x on the left corresponding to a 
a & may be made negative, which is not allowed. One way to handle 
I5 Situation is arbitrarily to increase the zero g's by very small amounts. 

15 of course changes the original problem to a new problem, but the 
V problem is nondegenerate and is very close to the original problem. 
bes We get a solution for the new problem, we can usually recognize the 
“tion to the original problem: If an x in the solution to the new 
Problem is very close to zero, the corresponding x in the solution to the 
gina] Problem is zero. For details, we refer the reader to a text on 


Ing К 
Sar Programming. 


ne 
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6.3. Application of Linear Programming to Statistical Decision 
Problems. Suppose we want to construct a minimax decision = 
s for a decision problem in which there is a finite number L of possib e 
decisions, a finite number / of possible distributions, and each distribu- 
tion allows a finite number of possible values for x. Denote by 4 ie 
total number of different possible values for x allowed by the whole set 7 
distributions. We assume that W(y;D;x) > 0 for all y, D, x. 1 
necessary, we сап add a positive constant C to W(y;D;x) to make this 50. 
The addition of C to W(y;D;x) simply replaces r(;s) by r(0;s) + € for 
all 0 and does not change the minimax decision rule. Under this 
assumption, M(s) > 0. 

From formulas developed in Chap. 5, 


r(0;s) => У ОР О) 
Doz т 


We denote Y W(y;D;x)f(x,y;0)by A(D;x;0) Thus 
и 
(03s) => Y А(0;х;0)5(0;х) 
D «к 


Define quantities z,,..., 2, by the equations 


È È A(D5xs1)s(D5x) +z = M(s) 
È У A(D3x;2)s( Dsx) + 2, = M(s) 


DN 


>> A( Dix;h)s( D;ix) + z, = M(s) 
2r 

: ssi 
Since r(0;s) < M(s) for all 0, 2,,2,..., z, are all nonnegative. ао) 
are known quantities forall D, x, and 0. Weset ира linear programm! ү 
problem as follows. The unknowns are MS) 21... 2:6 x) OF 
D,x. Therearel +h + Lqin all. The problem is to find nonnega 
values for these unknowns which satisfy the д + h equations 
h x) 


S(1:x) + sux) tu s(L3x) = 1 (one equation for eacl 


È X A(D;x:0)s(D:x) + zg— М(5) = 0 (one equation for each 9) 


А РИП he 
and which minimize M(s). The values of s(D;x) that we get from 1^ 


computation give a minimax decision rule s. 
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For the linear programming problem just described, we know that any 
point with the property U must have at least 1 + h + Lq —q — h = 
1 + q(4, — 1) zero coordinates. We can construct the following point 
with the property U: 


s(Ixx)—1 for all x 


S(2:x) = eem s(Lix)-0 for all x 
M(s) — max È AUD, € Aj +++, 2 аах) 
zy = M(s) —> А(1;х;0) forü = 1,...,ћ 


Note that q(L — 1) of the quantities s(D;x) have been set equal to zero, 
and at least one of the quantities =, . . . , 2, has been set equal to zero by 
Our choice for the value of М(х). Of course, there are many other simple 
Ways to find a point with the property U. Once we have the point, the 
Corresponding tableau is easily constructed. 

As a numerical example, suppose a store which has a storage tank 
available that can hold only one kind of fertilizer at a time has to decide 
Whether to fill the tank with fertilizer of type 1 or of type 2. In either 
case, the tank holds enough for 100 sales. There are many consumers in 
the region, and the customers who will actually buy the fertilizer are 
assumed to be 100 consumers drawn at random from all the consumers in 
the region. The consumers are broken into two types, and the price 
charged depends on the consumer type and fertilizer type. The net 
Profit on each sale is given by the following table: 


Customer type 
1 2 


1|100 40 


Fertilizer type 
ertilizer typ 30 120 


is Proportion p of consumers in the region who are of type 1 is not 
Own exactly, but from certain census data is known to be either 0.2, 0.5, 
chog Before deciding which fertilizer type to stock, the store will 

Con se two consumers at random and observe which types these two 

Sumers аге. The problem is to find a minimax decision rule. 

Ti Set up the following notation. Y denotes the number of the 100 
ium ua who will buy the fertilizer who will be of customer type 1. 
ino the description above, it is reasonable to assume that Y has a 

first E а distribution with parameters 100, p. А15 defined to be | if the 

Served consumer is of type 1; 0 if the first observed consumer is of 
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type 2. Х is defined to be 1 if the second observed consumer is of type 1; 
0 if the second observed consumer is of type 2. X, Xo Y are all inde- 
pendent, and each X has the probability distribution 


Possible value 0 1 
Probability lp g 


We assume that there is no cost connected with observing X; and X». 
Then the loss does not depend on X, or X,, and since 


100! 


JEsi = раи — pesn 100! __ 
Ж " y! (100 — y)! 


pl — p)" 


we see that X, + X, is sufficient for this problem. Denote X; + Xe by 
T. Then 


i 2! gs 100! 100-2 
Лур) = کک‎ pl — pp-t#__100! ر‎ 
по p) !ر‎ (100 — уу!” p) 


where the possible values for T are 0,1,2. We denote the decision to 


Stock fertilizer type 1 as decision 1 and the decision to stock fertilizer tyP® 
2 as decision 2. Then 


W(¥31) = —100Y — 40(100 — Y) = —60Y — 4,000 
W(Y:2) = —30Y — 120(100 — Y) = 90y 12,000 
To ensure that W( Y;D)is never negative, we add 12,000 to each WC y;D) 
using for the rest of the computation 
W(Y;1) — 8,000 — 60Y 
W(Y;2) = 90ү 
We set 0 — 1 to indicate that p = 0.2; 0 = 2 to indicate that р = 


and0 = 3 to indicate that p = i ў nate ae 
р = 0.8. Incomputing 4(D;1;0) we! 
> W(y31) f(tysp) is equal to itii 


2! У 
410 AD = ppt 100! 100-7 
по y? -» [8.000 — 60 DEM E NU ] 
een E уго — r^ 
000p? 

scs 2-108,000 — 6:‏ ا کک 

"Y per à < 
and 


2 W(y32)f(t,ysp) = 2! — (1 — p-'9.000 
7 р поа” py '9,000р 


l 
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This gives 
А(1;0;1) = 4,352 А(2;0;1) = 1,152 
э 4(1;51) = 2476 А(2;311) = 576 
A(2;)— 272  AQ;;)- 72 


A(1;0;2) = 1,250 А4(2;0;2) = 1,125 
А(1;1;2) = 2,500 — A(2;1;2) = 2,250 
4(1;2;2) = 1,250 А(2;2;2) = 1,125 
А(1;0;3) = 128 A(2;0;3) = 288 
А(1;1;3) = 1,024 А(2;1;3) = 2,304 
А(1;2;3) = 2,048 A(2;2;3) = 4,608 
itai an Construction of a point with the property U by setting the 
;1) as follows: 
5(2;0) = 1 s(1;0) = 0 
set) 2-1 s(1;1) =0 
$(1;2) = 1 5(2;2)=0 


Using these values, we find 


УУ A(D3t51)s( D3) = 1,152 + 576 + 272 = 2,000 
t 


X X 4(D;1:2)%(D;1) = 1,125 + 2,250 + 1,250 = 4,625 
t 


È X А(рБи;3уз( Фу) = 288 + 2,304 + 2,048 = 4,640 

t 

Thus We set M(s) — 4.640, so zg = 0,2, = 15, д = 2,640. Therefore 
° tableau Corresponding to this point expresses 5(2;0), 5(2;1), 5(1;2), 

> ^w and M(s) in terms of s(1:0), s(1;1), s(2;2), апа zg. This tableau is 


5(1;0) 5(11) (22) 23 


5a 2.640 —3,360 —2880 2,760 1 
Z 15 —285 —1,530 2,685 1 
Мв) | 4,640 —160  —1,280 2,560 1 


Si 
nce the Objective function is M(s), it is not necessary to add a separate 
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row for the objective function. We raise s(1;1) to 15/1,530, making = 
equal to zero, and thus getting as our second tableau: 


5(1;0) s(2;2) s "f 


—285 
1,530 


9,120 — 39,000 
17 17 


2 
21 2 T —3,360 + 


2 
M(s) D 1 = 


We see that по further improvement in M(s)is possible. Thus a minimax 
decision гше s is given by s(1;0) = 0, 5(2;0) = 1, s(1;1) = 15/1,530, 
5(2;1) = 1,515/1,530, s(1;2) = 1, 5(2;2) = 0. For this decision rule 5; 
M(s) = 4,640 — 1,280/102 — 12,000 (subtracting the 12,000 that Wê 
added at the beginning of our computation). 


6.4. Finding Approximately Minimax Decision Rules. In the PI 
ceding section, we have seen that linear programming can be used 19 
construct a minimax decision rule in a problem with a finite number 9 
decisions, distributions, and possible values for x. If these conditions 
are not satisfied, linear programming cannot be used. However, under 
certain circumstances, we can find a decision rule which is approximately 
minimax by the use of linear programming. ; 

For example, suppose that the possible distributions are those give? 
аз à parameter 0 varies continuously between 0 and 1. Suppose n 
arbitrarily assume that the only possible distributions are those given bya 
finite number of values of 0 spaced equally over the interval (0,1). 0 
r(0;s) is a continuous function of 9, and if our finite number of values of 
are spaced closely enough together, a minimax decision rule for the finite 
problem will be close to a minimax decision rule for the original problem: 

As another example, suppose our decision is to be based on a chance 
variable X which can vary continuously over an interval from A Um 
Suppose we modify the problem by breaking the interval (А,В) je 
small subintervals and, whenever Ж falls into the;th subinterval, arbitra"! y 
place its value at the midpoint of this subinterval, say, a, This changes 
the problem into one where the possible values of X are a. de - : : 6, 
[Under the 6th distribution, P(X — 4; is equal to the probability 
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assigned to the ith subinterval by the Oth distribution of the original 
problem.] If the subintervals are small, a minimax decision rule for the 
new figite problem will be close to a minimax decision rule for the original 
problem. 

‚ Ineach of the cases discussed, the number of unknowns in the resulting 
linear programming problem is likely to be large. The use of modern 
computing equipment would be necessary. 
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Chapter 7 


PROBLEMS INVOLVING A SEQUENCE 
OF DECISIONS OVER TIME 


: aking 4 

7.1. Introduction. Many important problems involve m » 
sequence of decisions at different times, rather than making a de to mgl 
only one time, as in the problems we have been discussing УП decision 
Then at a given time we must take into account the effect of the me 
we choose on the whole future duration of the problem; that is, we ER 
simply choose the decision which looks best for the си ШЕ. 
for such a decision may cause grave losses in the more distan ıimber 
We can compare the problem with that faced by a mountain t steep 
confronted by a choice between two paths. One path may p more 
and stony, and the other may look gradual and smooth, but fers 
pleasant-looking path may finally lead to a sheer cliff which is imp to the 
to climb, while the stony path may in fact provide a reasonable way nm 
summit. In making any decision, we must take into account the 
future, not just the immediate future, | ariables 

Some problems involve making decisions about which “To dis- 

1۰۰۰» Xm to observe, or how many variables to observe. - assis 
tinguish such decisions from the sort of decision we have been dis 
up to now, we shall call them “sampling decisions.” 


ў ision 
7.2. Problems Where There Is One Time When a Sampling em e 
Is to Be Made. In problems where a sampling decision is to | bles on 
at one time only, the sampling decision chooses the chance varia n ma 
which the regular decision is to be based. (The sampling decisio 
be to observe no chance variables.) ariable* 
Oncethe sampling decision has been made, so that the chance V eache 
on which the regular decision is to be based are specified, we enr all the 
a decision problem of the type we discussed in Chaps. 5 and 6, АП Je mu 
techniques developed there may be used. А complete decision IU 
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S Н . 

ELM € decision is to be made, and how the regular 

sampling геи E^ the basis of the chance variables specified by the 

follows” am сЕ о find a minimax decision’ tule, we proceed as 

sies, э а y sampling decision d, we can find a minimax decision 

max {буд Тешип part of the problem. As usual, M(s;) denotes 
P Su). en an over-all minimax decision rule is given as follows: 


Choose ; ; ТР 
the mers sampling decision d that minimizes M(s,), and then use sq for 
Дей киш. part of the problem. 
а P " = 
that the ae example, we modify the numerical example of Sec. 6.3 so 
et profit on each sale is given by the following table: 


Customer type 


Fertilizer type 
m 40 100 


Also " 

А кке, р of consumers in the region who are of type | is 
the store сд either 0.2 or 0.8. Before deciding which fertilizer to stock, 
Observe Joss any desired number m of consumers at random and 
Observe each types these consumers are. However, it costs 5100 to 

Nsolvin و‎ The problem is to find a minimax decision rule. 
€cision P is problem, we note that the specification of mis part ofthe 
Possible val - Our first step is to find a minimax decision rule for each 
alue of т. As in Sec. 6.3, we let Т denote the number of 


Consu 
Sumers 
unction is of type 1 among the m consumers observed. The loss 


Y) — 100m — 60Y — 4,000 
ү) = 100m + 60Y — 10,000 


Wy: 
(У) = 100m — 100Y — 40(100 
W(y: 

(Y:2) = 100m — 40Y — 100(100 


Asing 
ec. 6.3, it is easily verified that T is sufficient and that 


S (ty; т! 100! = 
Ур) کے‎ нр рут : «1 — p} 
а= 5d p) due 4 p) 


Ne 

Xt we 

c T я 

Onstruct a Bayes decision rule relative to 14,14. We have 


Ka; -— ! 
AM = — 4,400(0.8)(0.2)" ~] 


t! (m — f) [100m — 2,600(0.2)'(0.8)"~* 


КО; E M 


t! (m — ap! [100m — 4,400(0.2)(0.8)"* — 2,600(0.8)(0.2)"-'] 
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From this it is easily found that 

K(1;t) < K(2:t) if t > (14)m 

K(1;t) = K(2;t) if t = (%6)т 

K(1;t) > K(2;t) if t < (14)m 
Therefore the decision rule s,, defined as follows is a Bayes decision rule 
relative to 14, 14: s„(1;t) = 1 if t > (4)m, 5,0150) = 12 if t = (lay, 


s,(1;t) = Oif? < (4)m. (Of course, a value of t equal to (14)m could be 
observed only if т is an even number.) Then if т is odd, we find 


(n —-1)/2 


' 
7(0.2:s,,) = 100m — 8,800 У —“— (0.2)(0.8)"-' 
=o t!(m—t)! 
— 5,200 x om! (0.20.8) 
боту t! (т — 1)! 
(m-1)/2 ini 
r(0.8;5,) = 100m — 5,200 کے‎ (0.8)(0.2)"7' 
(o 11 (т — 1)! 
т m-t 
— 8,800 — m! ^ (0.8)(0.2) 
Si а= опу t! (m — t)! 
ince 
WE i т ; = 
& Hmm CS (0.2)(0.8)"-* = m! (0.80.2) Я 


tz(n1)2 f! (m — t)! 
Я ini X 
we have r(02;5,) = r(0.8;s,) = M(s,), and thus s, is a minim? 


decision rule for the given m, by the theorem of Sec. 5.14. 1f m is even, 
we find 
m/2—1 
(0.2:5,) = 100m — 8,800 У ——" — (92y(o.gy"-' 
t=o 1! (т — t)! 
" ; 
— 5,200 Y ——"' _ (02y(o8)"-' 
t=mio+1t!(m — t)! 
МГ 
— 14(5,200 + 8,800) (0,2)"'0.8) 
(m/2)! (m — m[2)! 
m! 


m/2-1 
7(0.8:5,.) = 100m — 5,200 — — — (0.8y(0.2)"' 
(=o 11 (т — t)! ; 

— 8,800 m! t m-t 

exa t! (m — 1)! окуз) 

; an aini 

— M(5,200 + 8,800) — — IA 
(m[2)! (m — m[2)! 


and we find as above that r(0.2:5,) = r(0.8; 


. :nimaX 
decis í s,,), and thus Sm 15а pese 
ecision rule for the given m, 


Өт 
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ee succeeded in finding a minimax decision rule for each 

RON e. а of m. | With the help of the formulas developed above and 

M(s the binomial distribution, we can find the numerical value of 
Sm), for each m. These values are given in the following table: 


М() 


— 7,980 

— 7,880 
—8,125.6 
—8,025.6 
—8,091.56 
—7,991.38 
— 7,980.12 


س دای د 200 . 


table, it can be seen that the best choice of m is 3. Thus an 

Bios: жын, decision rule for the problem can be described as 

if there н бск three consumers at random, an 

Stock fertili or I consumer of type | among the three 

EE izer type | if there are 2 or 3 consumers 
Sumers observed. 


d stock fertilizer type 2 
consumers observed ; 
of type 1 among the 


x Sequential Sampling Problems. A sequential sampling problem 


i 
id abes in which a sampling decision must be made at more than one 
remains ed of these problems are very complicated, and much research 
Problems or^ done in this area. | We shall discuss one of the simplest 
ave bur f this type, in order to illustrate some of the techniques which 
ound useful. 
hepên Te that the chance variables А, Xs, - . . аге distributed 
% PN cie | of each other and of the chance variables У,..., Yn 
nown to d of X,, Xa ... are all the same, and the common cdf is 
OF the y^ е one of the two given cdf^s F,(x) or Р(Х). А Ifthe common cdf 
While prse ee then the joint cdf of №... Ya iS 00... Jn) 
(A € common cdf of the Xs is Р(х), the joint cdf of Y, uda a 
tions: уре ,y,). Weare allowed the utmost freedom In taking observa- 
Whether q may observe no Y's, or we may observe X1 and then decide 
© ie observe X, or stop and choose a regular decision; if we decide 
egular P Xa, we can decide either to observe X, or to stop and choose 
€ shall ] ecision, ete. There are two possible regular decisions, which 
ande abel d, d, If we observe exactly т of the Х`$ before stopping 
Оозїпв a regular decision, the loss incurred is WE sus DF 


Where cis a gj с ў А 
dy, cl £ 1s a given positive constant and D can take the possible values 
€arly, this means that it costs us an amount c to observe each X. 


ar 


cm, 
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И : hen the joint 

Let us denote the expected value of W(Y, ..., Їз 4) м 
cdf of Y, ..., Y, is G(yy у) by a(G;3d)). Under any one to 
following conditions, no admissible decision rule will observe any А S.: 


(1) a(G,dj < a(G,;dy ^ and ^ a(Gs;di) < a(Gz dy) 


soe 5 ж 1 и- 
In this case, d, is at least as good a decision as d, no matter which ае 
tion is the true one, so ме may as well choose d, without taking 
Observations and save ourselves the charge of c per observation. 


(2) a(G,;d,) < a(G,:d;) and а(С„;4„) < a(G»;di) 
This is the same as situation 1 with the roles of d, and d; reversed. 
(3) аса) < c Гог і = 1,2 апа ј = 1,2 


In this case the cost рег observation is so large that it is worthwhile i? 
choose one of the decisions d, or d; without taking any observations a ed 
From now on we assume that none of the situations 1, 2, or3 holds. Jat 
it is no loss of generality to assume that the decisions are labeled so 


a(G,;d) < a(G,;d,) (1.1) 
a(Gs;d;) < a(Gs:d;) 


with at least one of these а strict inequality. Briefly, this means that ш. 
is the distribution, we should prefer to choose d,, and if G, is the distr! 
tion, we should prefer to choose ds. ng any 

Let sı denote the decision rule that chooses d, without observing ing 
X’s, and let s, denote the decision rule that chooses d, without observ! 
any X’s. It is easily seen that 


r(1;s) = a(G,;d;) 
1(235,) = a(Gy3d;) (7.2) 
r(1:55) = a(G, ;d;) 
1(2;55) == a(Gs;d5) 


We shall prove the following theorem. 

Theorem 1. There are values Ly, Ly, where 0 < L, < Le ° > p< 
that s, isa Bayes decision rule relative to b, | — b for every b with La e Tas 
1; sais a Bayes decisionrule relative to b,1 =b for every bwith O < „tainly 
and if Ly < b < Ly, then a Bayes decision rule relative to b, 1 — b€ 
observes X,. 


З ] е ative 
Proof. First we note that 5, is certainly a Bayes decision rule p D 

to 1, 0, since a Bayes decision rule relative to 1, 0 must minihi 

and the smallest possible value for r(1;s) is a(G,3d,), which is achie : 


ў ШҮҮ. ML кош z | ions; ! 
choosing the “right” decision d, without taking any observation? 
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other words, by usi 

A , by using the decisio imi iti i 

M usi n rule sı. 

: Ка Bayes е MER ende ^w Similarly, it is easily seen that 

. su ле 

isa Bayes ү that there were values b and b’, with b^ < b, such that s. 

This Шир соп rule relative to b’, 1 — b' but not relative to b, 1 — b 
<1. There must then be a decision rule s such that 


b т А 
0 r(1;s) + (1 — b)r(2;s) < br(1;s) + (1 — Drs) n 
n the other hand, we have 


b’r(1:5,) 4 ; 
(Ls) + (1 БО) < brilis) + (1 = 60r 0A 


We know 
that r(1; 
rs) > un ae < И ;s), and therefore it follows from (7.3) that 
in > s$). ñ Я (1. 
equality (7.3) is айтасы (74) we find r(1;5) < (13s). The 
b r(2:$) = r(2;s) 
1—5 "(lis)— r(1:5,) is 


and the j А 
© Inequality (7.4) is equivalent to 


b' r(25s1) — 0255) 
1— b” r(l;s) — Г(1;5у) («6 


But (7 
5) and 
ed star d. taken together imply that b’ > b, which contradicts 
Ule relative io А at b = b. This means that if s, is а Bayes decision 
, | — b', then s, is a Bayes decision rule relative to b, 


all , з 
Property фа 4 > b'. Let L, denote the smallest number with the 
= 5; is a Bayes decision rule relative to b, 1 — b for every 

We now 


$ p ur di " К 
Show that 5; is оп above shows that such an La exists. 
ayes decision rule relative to Ls, 1 — 2. For suppose 


1 Wer 
? not, 
Then there would be a decision rule / with 


Lan: 
M(t) + (1 — LyrQst) < Lars) + (1 — Lr) 


But th 
en th 
ere would be a value b’ slightly above Ls such that 


Ы) + (1 — bYZ) < rss) + — b’)r(2351) 

s decision rule relative to 
is a Bayes decision rule 
it is clear that s, is nota 
lue below Ls. 

e of a value Lı, such 
ery b. < Ly but 5 
y value above Ly. 


Which wo 
~ p pid contradict the fact that s, is a Baye 
pa tive toL, ] is contradiction proves that 5 
hs decision — La. From the definition of Ls, 
tha, 3 analo im relative to b, 1 — bif b is any Và 
is A Sois a Bayes d manner, we can prove the existenc 
а Bayes de lecision rule relative to b, 1 — b forev 
cision rule relative to b, 1 — 4 ifb is an 
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Our next task is to show that L, < Lẹ. For suppose that 14 а 
Then both s, and s would be Bayes decision rules relative to b, 1 — bio 
all values b between L, and L,. Then for any such b, we must have 

br(13s,) + (1 — b)r(2;s;) = br(1;s3) + (1 — b)r(2;s) (7.7) 
or, using (7.2), 
ba(G,3d,) + (1 — b)a(Gy3d,) = ba(G,:dy) + (1 — b)a(Ga;d;) 
which implies that 
a(Gs;di) — a(Gy ido) 
а(б»;4) — а(б»;й„) + a(G,3dy) — а(б,;а,) 


From (7.1), the denominator of this fraction is positive, and thus d 
(7.7) determines a unique value of b. But this contradicts the fact tha 
(7.7) must hold for all b between L, and L, and shows that L, < Le L 
To complete the proof of Theorem 1, we must show that if L, < b < то 
then a Bayes decision rule relative to b, 1 — b certainly observes X1: M 
show this, let b be a fixed value with L, <b < Ly, and let 5 аена 
Bayes decision rule relative to b, 1 — b. Let pı denote the probati? 
assigned by s to choosing d, without observing Xj, and let p; denote f 
probability assigned by s to choosing d, without observing Xr 


; and 
Pı + Pa = 1, s certainly chooses a decision without observing X1 4" 
clearly 


r(1:5) = pya(G,;d,) + p2a(Gy ds) (7.8) 
(2:5) = pa(Gs;d,) + ра(@,;4,) 
Also br(1;s) + (1 — b)r(2;s 
br(1;s) + (1 — b)r(2;s 
From (7.2) and (7.8), we find 


) € br(1;sj) + (1 — byr(2;5,) (7.9) 
) € br(1;s3) + (1 — b)r(2352) 


br(13s) + (1 — b)r(2;s) = Pilbr(1ss,) + (1 — b)r(235,)] 
+ pal bri s) + (1 — bris]. С 
5 ves 
But (7.9) and (7.10) contradict each other, and this contradiction pf? 
that p, + ps 1. igned 
Since p, T Pa < 1, there is a positive probability 1 — pı — P2 pent 
by the decision rules to observing Ху. Let denote the decision rU 


observes X, with probability 1, and thereafter makes exactly the 5? 
decisions as s does once s observes Ху. Then we have 


7.10) 


r(155) = pya(G,:d,) + ра(С\;д„) + (1 — p, — pp)r(13) (7.11) 
r(2;s) = pya(Gysd,) + P2a(Gyidy) + (1 — p, — р„)г(21) 
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Tom (7.2) and (7.11), we find 
br(1;s) + 
, РО) + (1 — b)r(2ss) = plora s) + (1 — b)r(2351)] 
+ pa[br(1;sa) + (1 — b)r(2;s)] 


Since + (1 — p, = p2)[br(is) + (1 — b)r(2;0] (7.12) 
RP br(1;s) + (1 — byr(25s) < br(L;s) + (1 — b)r(235,) 
br(1;s) + (1 — b)r(2:s) < br(L3s2) + (1 — b)r(2:50) 


it foll 
ows from (7.12) that if p, + рь > 0, we would have to have 


br(1;) + (1 — b)r(2;t) < br(1;s) + (1 — b)r(2;s) 


Which ¢ à 

— eue the fact that s is a Bayes decision rule relative to b, 

This complet neans that p, + p, = 0, so that s certainly observes Ал. 
efore Pes the proof of Theorem 1. 

(х) has iden. Theorem 2. we introduce the following notation. If 
only in [rna аа denotes this derivative. If F;(x) increases 
We prove ar Sa denotes the P(X = x) assigned by F,(x). Now 

lienem 2 ollowing theorem. 

enote tho deci Suppose b a given value with Li <b < La. Let f, 
Where n is th ecision rule described as follows. ty observes Xy... . Xn 

S the smallest positive integer for which it is not true that 


: А i) GOOG ДЫ) „_8 t «i 


" P А \Ly SAAD ЛАХ) 1= BL 
10 Ара e, 8 E = 1) t, chooses d. 
" ЛОХ) f GG) 1— bM 8 1 
AD 80 > x ( | i) 1, chooses dy 
Then ый 03) fX.) 1 — bM : à 
Foof, 5 a Bayes decision rule relative to b, 1 — b- 


For typographical simplicity, we denote 


bf (ху... 
18) fix.) + — БЛ) fa) by q(t ++ > Xm) 


Ora б 
апу Positive integer m. 

1 —b. From Theorem 

positive integer m, let 
thas the true Era ыы the conditional expected loss when using sand 
ae b ies " Stribution, given that we have observed Xj, . .. » Хы and 
he Xm =X, Let g(X»---7 х1) denote the proba- 

he first observation 


‚уду. -— 
1 551 Ы 
Sequal t Bned by s to observing at least т Xs, when t 

„апа the(m — 1)st 


» We kn 
ШЕ OW t 
(5 5 | X, hat s certainly observes X,. For any 
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observation is equal to x, ,. [Unless s used randomization in making 
its sampling decisions, the only possible values for g(x;...- Em 
would be 0 and 1.] г 

For the sake of definiteness, we assume for the remainder of the 
discussion that F,(x) and Р(х) increase only in jumps. The case where 
density functions exist requires extremely simple modifications. We сап 
write, for i = 1 or 2, 


Nis) = Km) X: gle < Xm Dhl) Sleds | e sor Xo) 


ij Lm 


Here K,(m) is the conditional expected value of the loss, given ee 
sampling terminates before X, is observed. The second expression o 

the right should not be difficult to understand, since gG oa 
Дх) ° * + fo) is the probability of observing X, = х„..., X» =~" 


. : ДУ з с) iS 
when s is used and F (x) is the true cdf for X, while r(/; s | Xi, -+> Xm) 
the conditional expected loss given that we have observed X; = Xi: ^' 


X = x,. Then we can write 
br(13s) + (1 — b)r(2;s) = bK,(m) + (1 — БК т) 
FE ЗО Ф oy Ry) 


r1; s | xy... Xp 
G(X +++ Xm) I^ ) 


О DAs fi) 
qx... x4) 


(25s | gy sens aal 


ж, © " ible 
Then it is clear that s will make this last expression as small as poss! 
only if, for each set of values Xy... Xm, 5 minimizes 

DAD eg) 


UES | gs a 
INTE | 1 Xm) 


= К 43) 
HOT DA) ee soo © 
qn... Xm) 
ues 
Next we define a decision rule 505, ..., Xm) for each set g er 
Xb + s Xmasfollows. s(x,,..., King) USES: Жу. Жы: I exactly the 


Wayassuses А.ч, X, S... . afters has observed X,— kpe m 1 
Еог example, if after observing X = Xp... p, Xm = Xm 5 StOPS "d if 
and chooses d,, then 5(Х,..., Xm) chooses d, without observing e dy 
after observing X, = Xp... Xm = Xm s observes Xm, and pr 0590 
if Xan < 3, then S05, ..., Xm) Observes X, and chooses d, if As 
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Sin , 7 
A се the X^s are independent, and each has the same distribution, and 
S Sias much to observe one X as another, we have 


r(1;s | xy... xS) = М5 sees Xm) + em 


r2; | xy... Xm) = 05 50:55: Ха) + em aga 


Th i 
Y € term cm represents the amount that must be paid for observing 
»-.., Xm an amount that would not be included in r(i; Q5, -+ -> %)), 


m» 


sinc ; НЕ i i 

Өр — Xm) is a decision rule which starts sampling at Х,. 

pes uting (7.14) into (7.13), wesee that for each set of values X1, ..., X, 
Ould minimize 


ст + PAG) ЛО) 
q(x, э, Xn 
LO — ВАО) С) +), рро) 043) 
E. а(х». - Xu) 
Will minim; 
ill minimize (7.15) if S(x}, - - - » Xm) is a Bayes decision rule relative to 


(ls Sus eX) 


bfx) A: SAC d c bien) = nd 
F аа) ' ++» 
r 
Theorem 1, we know that this implies that s(%,.-+> 
€ d; without observing X; if 


Xm) should 


bf) f(x) s ТЬ 
sho 1 q(xy «5 Xm) е 
u 
d choose d, without observing X, if 
bf) `` NA) <h 
ee 


an 
€d should observe X, if 
„ Mies) fiw) < Ly 
` | (ваз Xm) 
ini how s(x,, . . . , Xm) was defined in term zi 
Ze (7.13), s should choose d, without observing X. 


L, 


we see that for s to 
mu if 


Q(X <<» Xm) 


5 shoul 
d choose d, without observing Xm+1 if 


БАС) ЛС) < Ly, 


Q(X - -> Xin) 
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and s should observe X m+ if 
ES bfx) (хь) gjs 
Q(X, + -s Xm) 


A simple calculation shows that this description of s is exactly equivalent 
to the following: s should choose d, without observing Xaa 


fou) ++ xs) o b ( 1 1) 
Лоа) fos) =b 
s should choose d, without observing X „+1 if 


SAX) fox) = b (+ 1) 
Ао) (ы) 1 — b\L, ы 
s should observe X m+ if 


b (L a) eed <_b (1 _1) 
t= pi ID e Aim) 1—5M4 

But this description of s shows that s is the same decision rule as и 
decision rule 1, described in the statement of Theorem 2, and thus prove 
the theorem, since s is a Bayes decision rule relative to b, 1 — 


es: : : - its dis- 
The decision rule /, is called а “Wald sequential rule,’ after its di 
coverer. 


74. Finding a Minimax Wald Sequential Rule. In Sec. 7.3 я 
described a decision rule which is a Bayes decision rule relative to Ь,1 
In this section we attempt to find a minimax decision rule for the dec 
problem described in Sec. 7.3. jsion 

First we show that if a(G,;d,) > а(@,;4), then s, is a minimax dese 
rule. This is so because r(1 js) = a(G,;d,) and r(2;5)) = (Gai) ЫП 
therefore M(s,) = r(V;s). Since it is known that s, is a Bayes decis n 
rule relative to 1,0, the theorem of Sec. 5.14 tells us that s; is minimax. ini- 
exactly the same way, we find that if a(G,;d,) < a(Gy do), then $215? " 
max decision rule. : E zd 

From now on we assume that a(G,;d) < a(G,;d;) and «(б n 
4(G»;d;), which is true in all problems of practical interest. ET tive 
finda value b with L, < b < L, such thatthe Bayes decision rule f» ei 


ision 


hows 
to b, 1 — b has r(1;t,) = r(2;t,), then the theorem of Sec. 5-14 $ ihat 
that 1, is minimax. The description of 1, given in Sec. 7.3 ed lues 
the decision rule is ly la 


b described completely once we know the 


1 b [1 ИК © 
Tz т — 1) and = jx: = 1). For typographic simplicity 


b 1 
d 
пое xz — 1) by B and 


с 15 
i Е 1) by 4. Our next ge 


b 
І = pMa 
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to try to find the values of A and B that make r(1;5)) = r(2;t,), so that 
1, is minimax. Actually, we cannot find the exact values of A and В 
that do this, so we shall have to be satisfied with approximations. 

We denote by «(4,B) the probability of choosing d, when the true 
common distribution of the X's is F(x) and the decision rule f, is used. 
We denote by (4, В) the probability of choosing d; when the true common 
distribution of the Xs is F(x) and the decision rule t, is used. When the 
decision rule 1, is used, it is clear that the total number of X's observed 
before a final decision is chosen is a chance variable, which we denote by 

^ Also, n (А,В) denotes the expected value of N when the true common 
distribution of the X’s is F(x). Then we have 


(1:5) = [1 — 8(A,B)]a(G,:d,) + B(A,B)a(Gy ds) + ст(А,В) 
r(2;t,) = o(A,B)a(Gs;d;) + [1 — 2(4,B)]a(65:d;) + сп(А,В) 


We shall develop approximations for «(4,B), 6(A,B), (4,8), and 
(А,В). This will enable us to find approximately the values of A and B 


(7.16) 


that make п): = ROD | 
фе" 5 (m) denote the set of m-dimensional points Хх... , X» such that 
1 = Xy... X, =x, then 1, continues sampling to X,, and chooses 


d, Without observing X. Then, when F (x) is the true cdf for the Хз, 


m-1* 
P(d; is chosen and N = m) = Y ^'^ 2, 9 et fig) 
z Sm 


41 Every point x,,..., x, in 8,07), 
fla) fx) p 
RDI os) 
a 
nd at Every point x,, ..., x,, in Sy(m), 
fi - foa. A 
0 OD Лбх) 
Ur first approximation is to assume that at every point in Sm), 


f(x): fal%m) _ B 
апа at fen)" АС) 
every point in S,(), 
faa) Хы) _ 4 
shi 


PProximation looks drastic, but it works fairly well if the functions 


This a 
Лоу апа f(x) do not differ greatly from each other, for then the ratio 
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(Xm) [fi(%m) will probably not move the whole ratio very far below В or 
above A. Using this approximation, we have 


segs ee = T x) AG) 
X fo) SXm) = B X Ai) 


(7.17) 
> к У fx) СЉ) = А У fre У AC») ses (шн) 
So(m) Sa(m) 
Also A,B) = Y Y... Y fos) Len) 
1 — o(A,B) TEMA fax) ЛО) бай 


A,B) = Y Y... Ул) Hn) 


1— (A,B) = XY- X fs Л) 


m-1 Sim 


in 
Summing the expressions given in (7.17) with respect to т and using 
(7.18), we find 


«(А,В) = B[1 — В(А,В)] 


1 — (А,В) = AB(A,B) 
Which is equivalent to 


1—B 
A,B) = ——= 
Ka) 18 aif 
EN 
ns = s(4—1) 


It should be remembered that these ex 


re 
ч ри а 
pressions are not exact, 
approximations. 


e two 

Next we develop approximations for n,(A,B) and n,(A,B). ТО) 

quantities are finite, otherwise at least one of the quantities (1 Precision 

would be infinite [Eq. (7.16)] and t, would not be a Bayes г FO 

rule relative to Б, 1 — b. This means that under either А0) © к 

S К Р = i reas | 
‚> JP(N = j) < co, so that Y JP(N = j) approaches zero as nC 
j- j-m 

Since 


[3 


j S zm) 
PX =) > ту P(N = j) = mP(N > т) = m[1 — P(N 
ј= т 
we have shown that under either F(x) or F(x), 


lim m[1 — P(N < m)] =0 
mo 
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Denote log LAX ОЛОХ] by Z;. We show that under either F(x) or 
Fx), E{Z + +++ + Zy} = ЕМ) E(Z,), by the following argument. 
For апузћхеа positive integer m, 


mE(Z) = E(Z, +--+ + Za} = PIN = j) EZ, t Zul N =]} 
j=1 
+ P(N > т) E{Z, + + Za |N >т} 
Bu EZ, +--+ + Zp |N =j} = EZ, +: + Z| N =} + Ea 


Toc Z, | N = j}, and knowing that N = j gives no information 
at all about Z,,,,..., Zm so that E(Z;ia ++ Z| N =j} = 
(m — J)E(Z,), since the marginal unconditional distributions of Z}, . . . , 


m are all identical. Then we have 


m E{Z,} = x P(N =) E)Z, + +Z | N=j} 
iz 


m 


1 
» $ P(N = jYm — j) E{Z,} + P(N > m) E{Z t ^ + Zm [А> т) 
ј=1 
(7.20) 
Rearranging (7.20), we get 


m 
iw = j) E{Z +--+ z|N-j = E{Z,}[m — mP(N < т) 


+ JPN = jy] — PIN > т) E{Z, + 2,102 т) 020 
j*1 


PM We let m increase in (7.21. Theleft-hand side iii o. бе 
1" + + Zy} as т approaches infinity. The expression m — 
a Sun) can be written Er — P(N « m)], which is not larger du 
ИП СОРОМ < m)], and this last expression we know y peseeip ine 
Approaches infinity, showing that m — mP(N < т) approaches 


8s m approaches infinity. Y РМ =) approaches E(N) as m 


ZProaches infinity. Since if N > m itimplies that log B < Ze t 

A n 98 A, because of the structure of г, it is clear that = A а. 
т s 

егу „= M} | < max (| log B |, | log 4 |). Since P(N > m) app 


Sas m ` ign hat P(N > m) E{Zı +... + 
2 4pproaches infinity, it follows tha zm 1 
oly > т} approaches і аз т approaches infinity. Applying 
° facts to (7.21) and letting m increase, we get 
(7.22) 


We E{Z, +--+ +Zy} = E{N} E{Zı} 
the develop an approximation for E{Z, +“ + Zx} as epe йеп d 
is ch ucture Of fy, Z, +--+ + Zy is either no greater than p n Y 
true Osen) or no less than log A (then d, is chosen). Clear oe A E 
Оттоп cdf, p(z +--+ + Zy «logB) -1— ВСА,В), 
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P(Z, + +++ + Zy > log A) = (А,В). Similarly, if F(x) is the i 
common cdf, P(Z, + +++ + Zy < log B) = «(A,B), while P dee з 
Zy > log A) = 1 — o(A,B). As an approximation, we esce 
EU Zy} by assuming that Z, +--- 4+ Д = log B ar are | 
whenever Z, +--+ + Zy < log В, and that Z +e a Ly = cw 
exactly whenever Z, +--+ + Zy > log A. This approximation wor 


fairly well if f(x) and f(x) do not differ greatly from each other, E 
the quantity log [ fo(X)/fi(X,)] will probably be close to zero and wi r 
move the sum Z, ++- + Zy very far below log B or above log E: 
Using this approximation, we find that if F(x) is the true pep gave) 
{Zi + +++ + Zy} = [1 — B(A,B)] log B + (A,B) log A, and if ym 
is the true common cdf, E(Z, - --- + Zy} = (А,В) log B+ [ bee 
«(A,B)] log A. Now we denote E{Z,} when F(x) is the true cdf by =| 
i=1,2. Then, using (7.22) and the approximation just developed, 
find approximately 
n,(A,B) = LL 204.8) LE + (А,В) log A 
1 
ny(A,B) = 24:8) log В a — «(A.B)] log A 
lə 


(1.23) 


(It can be shown that neither hy nor ha is equal to zero.) 


et 
Finally, applying the approximations (7.19) and (7.23) to (7.16) We 8 
the approximations 


А 1 1—B 
r(1;t,) = (acra + (28) саз 


+1) в (1 — ®) tog 4| 


А —.B 


(7.24) 


r(2:t,) = в(^ = 5) Gajdy) П (2. AB) al Guid) 


+ b=) log B + 2—42) log 4] 


inimaX 
To find the values of 4 and B that give an approximately mint 


4 us 
decision rule, we equate the two right-hand expressions in (7.24) ant ie 
this equation to express А in terms of B, and then we choose B to ep from 
the right-hand sides of (7.24). The actual computation of 4 and i 
these conditions can be quite laborious. Sometimes the use of grap 
convenient. In the next section we discuss a numerical example. 

7.5. A Numeri 
Sequential Rule, 
à swarm of insect 


ЖИР Wi 
ical Example of an Approximately jn ene ed bY 
Suppose an agricultural area has been inv 


4 15 
м х агеа 
5, which are busily laying eggs. Unless the 
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Sprayed in time, the hatched larvae will devour an expected $250,000 
Worth of crops. However, the insects are all of variety И, or variety Vo, 
Which ойе is not known, and only one specific spray is effective against 
Vi, while a different spray is effective against Va. Also, only one spray 
can be used, since in combination they poison all animal life. It is 
ele to tell for certain which variety of insect the invaders are until 
us eee hatch (and then it is too late to spray), but the geneticists inform 
Vasa the number of spots оп an internal organ of each insect is a chance 
iise ex Whose distribution depends on the variety ofinsect. Foran 
ct of variety V,, we have the distribution f(x) given as follows: 
x 0 1 2 


Ло) 


Foran j А "e : + 
Tan insect of variety V, we have the distribution f(x) given as follows: 


Me Me Me 


x 0 1 2 
f(x) 
s al to have the number of spots counted for each insect. The cost 
to ш ТЕ 15 negligible. The problem is to decide which type of spray 
insects 9n the basis of observations of the numbers of spots on as many 
$ aS We care to inspect. 
E label as d, the decision to use the spray effective against V, and as d; 
cision to use the spray effective against Va. Y, Xs, . . . are the 
Variable Of spots on the first, second, . . . insect we inspect. The chance 
€ as € Y is the value of the crops that will be destroyed by the larvae. 
Then Sume that if the correct spray is chosen, it is 100 per cent и 
a(G, ar have a(G,;d,) = 0, а(С\;4) = 250,000, a(Gs;d;) = 250,00 1 
lo в) = 0. Also, c = 1, and №, = (146) log9 + (916) log 1 4- (#46) 
Then the Zll, ha = (%6) log 9 + (846) log 1 + (Мв) log 35 = 1.1. 
Ше right-hand sides of Eq. (7.24) of Sec. 7.4 become 


Sree оз 


Ив Sie Me 


It Costs 


— AB 
2500005 (4 — 1 Li, ici (4 A Jig 4] 
À 3 B log B + 
q "ES. 1.1 s “к 
uati = | 
i ating these expressions gives A = 1/8. Substituting A = 1/Bin the 
Pression of (7.25) gives 


250.0008 ( — Blog B 
1 + B 1+B 1.1 
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The value of B that minimizes this last expression is approximately 
1/274,975. Therefore the value of A is approximately 274,975. | 

Denote by T (m) the number of the chance variables X; . . . , А which 
are equal to i (i сап be 0, 1, or 2). Then our approximately minimax 
Wald sequential rule can be described as follows: Continue sampling a5 
long as 1/274,975 < 970" < 274.975, and as soon as one of these 
inequalities fails to hold, stop sampling and choose d, if Отон) Ta < 
1/274,975; choose d, if 97«»0- 7409 >. 274,975. 

It is easily seen that in any problem where a(G,:d,) = a(Gs id»), 
a(Gs;d) = a(G,:dı), and hı = —h,, then the A and B that give the 
approximately minimax Wald sequential rule are related by the equation 
AB — |. Ourexample illustrates this. 


7.6. Problems in Which a Sequence of Regular Decisions Must Be 
Made over Time. Denote by Y(j),..., Y, (j) the chance variables 
that will be observed between the jth time at which we must choose E 
decision and the (j + 1)st time at which we must choose а decision: 
X, ..., Xm denote, as usual, the chance variables that are observe 
before any decision must be chosen. То save space, we denote cee 
Ү(), ..., Y, (J) by Y) and the set X,,.. ., X, by X. D(j) denotes the 
decision made at the jth time. Suppose that a decision must be chosen и 
T different times. Тһе loss depends on X, D(1), Y(1), D(2), YO i 
pO. Y(T) and will be written W(X, D(1), Y(1 ) D(2), Ү(2),...› DT) 

The key fact about the construction of a Bayes decision rule relative Ps 
a given B(0) in this case is that we must first describe how the decision ! T ; 
chooses D(T); then we describe how the decision rule chooses D(T — d 
then how the decision rule chooses D(T — 2); еіс. In other wor 5 ^ 
must work our way backward in the construction ofa Bayes decision Ч“, 
This is because in order to evaluate the goodness of a decision to be wn 
at a certain time, we have to know how we are going to proceed int 
future (that is, how we are going to make decisions in the future): the 

In choosing D(T), our decision rule will of course take into аббат 
values of X, D(1), Y(1), D(2), ¥(2),..., DT — 1), YT — DY 
will be known at the time D(T) will have to be chosen, Thus. for 
problem of choosing D(T), the quantities Y, D(1), YU), D(2)» YQ» ms 
D(T — 1), Y(T — 1) play the role that Y, nM. ; played in the prob, 
of Chap. 5, where T was equal to 1, while Y(T) plays the Jed of 
Y»... Y, did in the problems of Chap. 5. Thus the probe. a 
describing how a Bayes decision rule relative to В(0) chooses am 
problem of the type described in Chap. 5. M 

After we have described how the decision rule chooses D(T 1) 


) we ha 
expressed (Т) in terms of X. DO), YOY, py, YQ. DO. — 


a ааг 
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ND = 
ош а for the problem of choosing D(T — 1), we have 
X, Рау, Yd T D(T) by expressing it in terms of the quantities 
ocn ңа, Mak ae = ap Neo m 1. In choosing 
D(1), Үй) BG b rule will take into account the values of X, 
at the time Di c ¥(2), DO DT — 2), Y(T — 2), which will be known 
choosing D Б — 1) will have to be chosen. Thus, for the problem of 
DOs, ЭП — D. the quantities X, D), ИШ), DO» О, 
While tigen ( hes 2) play the role that X;,..., Xm played in Chap. 5, 
in Chap. ы аш Y(T — 1), Y(T) play the role that ¥,,..., Y, played 
D(T), decidi us, once we have described how the decision rule chooses 
the ty ing how the decision rule chooses D(T — 1) is a problem of 
pre described in Chap. 5. 
(T gor, used above can be applied to the problem of choosing 
already es s. — B), etc: For the problem of choosing D(j), we have 
JED E^ ed how the decision rule chooses D(T), D(T 7 osse 
D i+ ji ut this means that we have expressed D(T), D(T — 1),..., 
Y(j + ) n terms of Y, D(1), У(1),.... DU = D. YU 1), D(j), YG), 
RT n Y(T — 1), eliminating the quantities D(T), D(T — 1),...› 
discussed į Then the problem of choosing D(j) is a problem of the type 
D— 5a Chap. 5. The quantities X, 201), Y(D,..., DU — D, 
quantities Ру, the role that Xj, ..., X, played in Chap. 3, and the 
Played in Chee "5 YG + 1),..., УТ) play the role that №... Ya 
E 3. 
ы 7 ee of an over-all Bayes deci 
how DT applications of the technique of Chap. 5: one for describing 
Chosen. )is 0 be chosen; then опе for describing how D(T — 1)is to be 
numeri ; then one for describing how D(1) is to be chosen. 
compan re example will illustrate the description above. Suppose 
Customer | as promised to deliver two items of a certain type to a 
Stage proc у a certain date. The production of these items is a two- 
the item €ss, and in each stage there is a constant probability 1 — 0 that 
tha ian be spoiled during the stage. Only the half-finished items 
Surplus item the first stage can enter the second stage. Spoiled and 
Possible ms have no value. Because of time limitations, ПО reruns are 
lore sta he maximum capacity of production in each stage is 5. 
Will be rting the first stage of the production process, the company 
E Observe the number of items surviving a sna 
rocess in which the probability of spoilage © each item is 
are as] as Of two items started Шоп жүл process. Costs 
Cost of $1 Ows. Thecompany is to be paid $2,000, but will pay a penalty 
del ers -00 if it delivers only one item апда penalty cost of $2,400 if it 
hrough qn, me: It costs the company $300 for each item ан. 
first stage of production and $100 for each item starte 


sion rule relative to (0) 


Pr : 
1 “duction 


126 STATISTICAL DECISION THEORY 


through the second stage of production. Suppose the value iei = 
unknown (except for the obvious fact that it is between 0 and 1) m zl 
want to construct a Bayes decision rule for the problem аа isa 2 
priori distribution B(0) which has pdf (0) = 2(1 — 0) for O < 0 < 

We set up the following notation. X denotes the number of iten 
surviving out of the two items started through the single-stage 
that will be observed before any decisions must be chosen. D(1) is th 
number of items started through the first stage of the two-stage dug 
Because of the restriction on capacity, the possible values of D(1) 0) 
0,1,2,3,4,5. Y(1) isthe number of items Surviving the first stage. _ PH 
is the number of items started through the second stage. Y(2) is ш 
number of items surviving the second Stage. Clearly, D(1) > y() a 
D(2) > Y(2). From the description above, it is easily seen that the l0 
function does not depend on X, and is given as follows: 


А 2 
W(D(1), Y(1), D(2), Y(2)) = —2,000 + 300 D(1) + 100 D(2) if YQ) 


(2 
W( D(1), YC), D(2), Y(2)) = —2,000 + 300 р(1) + 100 DO) + 1,200[2— YOJ 
if Yo) < * 


Also, it is clear that for given values of D(1) and D(2), the chance variables 
X, Y(1), and Y(2) are independent, and each has a binomial ашшы e 
the parameters for Y are 2, 0; the parameters for Y(1) are D(1), 0 
Parameters for Y(2) are D(2), 0. Thus for nonnegative integers X, J mk 
J'Q), with x = 2, Y(1) < D(1), and J(2) < D(2), /(х,у\(1)„у(2);0) is e" 
to the product of the following three expressions: ` 


"$4 
کے‎ DN or -— 0 
x!(2— x)! 
D(1)! 


D perc LA "(1), = DQ)- (1) 
J(1)! (D(1 )— (1))! ш روا‎ 


D(2)! 
УО) (D(2) — (2))! 


orai 0y?um- vam 
Р А ative t° 
The first step in the construction of a Bayes decision rule relat 7 
В(0) is the description of how the Tule chooses D(2) for given ins: 
We compute K(D(2):x, D0), y(1)) exactly as in Ché 


the 
and choose the value of D(2) that minimizes K(D(2);x, Ю(1)„)у\(1)) e 
given values x, D(1), yd). We have 


K(D(2):x, D(1 ),у(1)) 


1 Di2) x 40 
= [nol 2 WOO. (0), (2), (2f camas] 
Ы »(2)-0 


(2 
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Carrying out the summation with respect to 1(2) gives 


K(D(2) x, D(1),(1)) = (0) 2! D(1)! 
Jo x! (2 — x)! y(1)! (D(1) — у(1))! 


x OFFA 0) pw- [_ 2.900 + 3000(1) 


+ 100002) + 2,400(1 — 0)”® 
-- 1,200 D(2)0(1 — 0)Р®-т] d0 
Setting (0) = 2(1 — 0) and integrating, making use of the fact that 


^ 


1 
| ('(1 — 0у d0 = 
“0 


ris! 
(r +s + 1)! 


o 


; 
or añ:noniëpative integers r and s, we find that K(D(2);x, D(1),y(1)) is 
Elven by the expression 


2! 
Fx DUM Бод + 300D(1) + 100D(2)] 
**Q — x) xT (DC) — yo» 


x KF OG + D() — x — y(0! 
(4+ D(1))! 


+ 0! G + DU) + DO) — x = MD)! 


+ 2.400 * 
(4+ D(1) + D(2))! 


112 + DU) + D(2) — x — у(@))!\ 
(4 + D(1) + D(2))! | 


, 1.200 p(2 & Fy(1)4 


A det; , 
sh tailed calculation and comparison shows that if D(1) < 4, then 29 
©set equal to Y(I) no matter what the value of Xis. If D(1) = 5, 


— 0, we should s 2) equal to Y(1), while if X is 1 or 2, then 

ity Should be set ee А н, 1) = 4 d should be set equal to 4 

'S equal to 5, This completely describes how the Bayes decision 

les D), 

Чо thi et task is to describe how the decision rule chooses D(1). To 
We Compute 


k " ' 
"лы -| СО T S раар, 002007 (e72: [ 
0 


м0 w2)-0 


Tule ch 


Where ; 
‚ i e values 
of y т the computation the value of D(2) is determined by the valu 


ЭЧ), and J(1), as described in the preceding paragraph. The 
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computation of K(D(1);x) is routine but lengthy. The numerical results 


are 
К(0;0) = 200 К(0;1) = 133.3 
K(1;0) = 320 K(1;1) = 166.7 
K(2:0) = 440 K(2:1) = 200 
K(3:0) = 5631 K(3;1) = 246.7 
K(4;0) = 689 K(4;1) = 305.4 
K(5:0) — 818 K(5;1) = 374 


From these values, we see that the Bayes decision rule sets D(1) = oif 


X = O0 or 1 and sets D(1) = 2if X = 2. 


In summary, a Bayes decision rule relative to B(0) is to Р si 
nothing if X = 0 or 1 and to start two items through the first stag 


X =2. All survivors (if any) of the first stage are started t 


second stage when D(1) = 2. 


K(0;2) = 66.7 - 
K(1:2) = 46.7 
K(2;2) = 26.7 
K(3:2) = 33.3 
К(4;2) = 51 
К(5;2) = 89 


hrough the 


Chapter 8 


THE EMPIRICAL CUMULATIVE 
DISTRIBUTION FUNCTION 


8.1. Introduction. In many important problems, Xs. Xm 
К А Y, areall independently distributed, each with the same distribu- 
On, which is unknown, and the loss does not depend on the values of 
De -++,X,,. Then we can write the loss function as W( p AED (T. D). 
ñote heicommon wiknown adf of Xu « «s Xan Mio Ти BF 
x pected value of the loss, for any fixed decision D, depends only on 
Fe On F(x), and we denote this expected loss by S(F;D). is we p 
Would. would simply choose the value of D to minimize S(F;D), an : 
n f Be necessary to observe Xi, «+ +> X,. Itis because we do ДО 
obser F(x) that we observe Xas» » Xm and try to estimate F(x) from the 
Way of values of X, ..., Xm In this chapter we describe a certain 
es : 1 
fine the “omy , Xm” denoted by 


H is define the "empirical cdf based on Xs sss 


>“1..., Xm), as follows: for any given х, 
X,, no greater than x 


не; A e X,,) number of variables Ху..... 


ae that this function H(x; Xy ..., Xm) is defined im € ү 
аррго às the essential properties of a cdf: it 15. non з ing S 
Show үз 0 as x decreases and approaches 1 as x increase ^ MEN 
to кү " t as m increases, the probability that H(xi Xv «54m 
77 tor all values of x approaches 1. 
í 82, Stochastic Convergence and Tchebycheff’s Inequality. — ue 
st Pon chance variables and k is a constant, We зау, 7 el of 
Positie Cally to К as i increases" if the following 15 true: т dated 
numbers є, д, we can find a positive integer (<) 
129 
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; = Иә 
each and every integer т above n(e,6), the inequality P(|Z. kl ) 

с, ш, i okasi 
| ote roughly, to say that Z, converges Кене "i kis 
increases means that for large m the probability that Z,, is c 

Ll : 2. фб I eff's 
kc device for proving stochastic convergence is hen > 
inequality,” which states that if W is a chance variable such : НЕ, 

0) = 0, then for any positive number b, PW = Б) >1 — A de a pdf 

First we prove Tchebycheff’s inequality for the case where 
gw). Since P(W < 0) = 0, gv) = Oforw < 0. Then 


a b 2o 
E{w} -Í wg(w) dw -Í wg(w) dw + | wg(w) dw 
0 0 Jy 


А x `® W> b) 
>f wg(w)dw >| bg(w) dw = | gw) dw = DF 
[Д b САЛ 


т 


p, com- 
which gives P(W > Б) < E(W}/b, or POW < by ~ 1 — ЕГИ}, 
leting the proof. w has 
| Next we prove Tchebycheff’s inequality for the case where 

a distribution which can be given in table form, as follows: 


Possible values 
a 
Probability 


Рі Pe Ps k the 
where we May assume that 0 — Wı < Wa <+, Denote by 
largest integer such that Ww, < b. Then 
" k © E 2 >b) 
E(w} = У pw, E pi: У рм > > pw, > У р = bP(W 

i=] i=] i-k*l i=k+1 i=k+1 
which gives P(W > b) < Ew} 
pleting the proof, 

A useful application of T 
Suppose Z, has a binom 
positive integer jj, Then 
as i increases. To do this, 


от” 
lb, or P(W < p) > 1 — Е(И)/ © 


m 
" scribe 
chebycheff^s inequality will now be € cach 
ial distribution with parameters 71, P cally top 
we show that Z,/i converges stochast! rse 


cou 
we define W, as ((Z;/i) — py. Then o heb 
PW, —0)— O, andit is easily found that E И} = p(Y — р). for any 
cheff’s inequality then gives PW, < p > | — (p — pbi), ے‎ vb 
Positives. But pig... B) = PU ID — sex By J WES p 
then for every i greater ; 


i) — pl j 
than p(1 — р)/де we have P(\(Z, i) P пу tof 
1—0. This completes the proof that Z Jj converges stochastica y 

as i increases, 


To 
5. ы nt. 
‚8.3. An Inequality оп the Probability of a Combined er nave 
simplify the notation, we denote the event (not A) by A. The 
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the following useful inequality. If А), A», ..., A, are any events, the 
inequality P(A, and A, and - -- and A,) > 1 — P(A) — P(A») 
P(A,) holds. B 
To prove this, first we note that P(A, or As or: ++ or А„) < P(A) + 
Аз) + +++ + P(A,). This is easily seen, since the right-hand side 
ould overcount trials on which more than one of the events A1, 45, . . . , 
4; occurs. Next we note that the event (A, or A, or‘ or Ap) is the 
Same event as (4, and A,and---andA,). Therefore P(A, or Ay or: ++ or 
4) = P(A, and A, and ---and A,) = 1 — P(A, and A, and - сапа 
|) hd + P(A») + +++ + P(A,), and the proof follows very simply 
is. 


Wi 


‚ Note that if Ay, А... A, all have high probabilities, then the 


equality shows that the event (A, and A, and - : and A,)also has a high 
Probability. " 


Convergence of the Empirical cdf to the True cdf. Through- 
a lapter, we are assuming that X1,..., X,, are independent 
enr Nee variables, each with the same cdf F(x). We have defined the 
Plrical cdf Hc: СИННИ X,)in Sec. 8.1. In this section, we shall 
а for т large, H(x; Ху, ... , Хы) will be close to F(x) with high 
sah о We show this by means of three theorems. enn 
Stochas em 1. For any given value b, H(b; Xy» +> Xm g 
í ‘tically to F(b) as m increases. . а 
ПЕ - First we note that mH(b; Xy .... X,,) is a chance уаш 
а binomial distribution with parameters 1, F(b). For " О; 
greater t Xm) is the number of the values ss E ed im 
ility Fb an b. But Жуз, X,, are independent, an T na p 
has à pi ) of being no greater than b. This tells us that 777 } тне 
nomial distribution with parameters m, F(b). But then 


1% з 2 
Пом Immediately from the example given at the end of Sec. 8.2. 


ы E - (k, say 
a 2. For any given positive e, there is а finite number * , ) y) 
a Б лета b, such that if ІНФ Xo +- Fo) < i 
v < є/2 fori = 1,..., k, then max |Н(х; Xs -+> Xm) : 


r . і * : and 
The 00]. First we give the proof for the case where Ех) Б ien and 
Il We ca ‘fyi uation F(b,) = (/2 ап 
p Ы п define 5, as a value satisfying the eq b, always exist 
54 e larges integer such that ke/2 < 1. Such ҮШ : а кз for 
COnven; ~) 1S continuous. We also define bọ as — © Me | 

lence in writing, noting that 


and [II D OMNE X m) — F(byl = 0 


Інь: Ху... Xm) — ВК) = 0 


Y 
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Next we show that if 


€ 
Hb; X, ..., Xm) — F(b)] < 5 
апа 


€ 
IH (bia13 Жу... Xn) — Е(Ь,.1) =з 
then 


IHG Xis Xm) = Ех) «€ — for all x between b; and b;+1 
Suppose not. Then we have either 


Case 1: кы 
H(x;Xy...,X,)» F(x)--e for some x' between b; and b; 

Case 2: oan 
A(x’; Xs... Xn) < Ех) е for some x’ between b; and bi+ 
In case 1, 


A(x’; Xy... Xn) > F(x’) + є 


> Е(Ь) + є 
= ie +e 
(+ 10е. ë 
= 3 + 5 
€ 
= К) + 2 
) 24 
Xa 
which is impossible, since Н(х';.Х,,..., Xm) < НЫ; Xo: 
F(bi) + є[2. Thus case 1 cannot occur, 
In case 2, 
i+ De e 
Hoi Xy... X) < FM eu Fb. y m gm nz uH 
ШЕ. & 
uw 
$ 
= F(b) = p 
Which is im 


Жө 
Possible, since Al's Xi... Xm) > НЬ; Xv 
F(b,) —«/2. Thus case 2 cannot occur. 

The impossibility of cases l and 2 shows that if |H(b,; Xi» - hen 
F(b))| < є[2 and ІНЬ; X,. XS) — Fu) < є[2, t һе 


bee, Xm) — F(X) < e for all X between b, and b; Т 
follows immediately, 
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The proof for the case where F(x) is not continuous is exactly the same 
as the proof just given, if we can find values b; such that F(b;) = ie/2 for 
all Positive integers i, not greater than k, where K is the largest integer 
with ke/2 < 1. Ifwecannot find such values, it is because of the presence 
Of discontinuities in F(x) at certain inconvenient places. This situation 
Is handled by including among the points D, . . . , b, all points at which 
F(x) jumps а distance of at least «/2. The details of the proof will not be 
Carried ош. 

Theorem 3, тах |Н(х; Xy, ..., Xm) — F(x)| converges stochastically 


to ES asm increases. 
int roof. We must show that for any given positive e, 6, th 
°вег n(«,5) such that P[max |H(x; Xn.. -> X») = Fal < d > 1 —? 


Ed апу m > n(e,6). Let р denote the event |H(bi; Xy...» Xm) —\ 
( Jl < e2, fori = L...,k, where b, < ba < `` < b, are the values 

iene in Theorem 2. The event 4, and 4, апа · · and A, then 
Plies the event max IH; X1,..., Xm) — FO) < <, so that 


ere is an 


Р H 
Imax [Hs X1, ..., Х„у — Fa) «> P(A, and Азап: ` and A) 


B Я 
= Theorem 1, there is a positive integer M such that P(A) > 1 — fk 
гапу т > M,andi = 1,...,k. Then P(A) < òlk for any m > M, 


and; $3 

"A ы LE By the inequality of Sec. 8.3, P(A, and Asan P a 
VP RI ОЯ COME POA — k(8]k) = 1 — 9, 
>м. т (А) — P(4,) — P(A,) > 1 — kl ^ ЫР 


" hus P[max |H(x; Xy... Xn) - FO) <4 > 
as т 5 = 
M, and this completes the proof of Theorem 3, with n(e,0) = M. 


8. ; i 
prow, The Empirical Decision Rule. In this chapter we are discussing 
са 0115 in which X X... Y. Y. are all independent, an 

ach ha 1» sms 41-2555 7n don Kss.: Aa. FOE 


S cdf F(x Jd de 
“n (x), and the loss does not dep 
and oe decision D, the expected value of the loss depends only on D 
f | > value by S(F;D). 
à nly slightly if Л) 
, iti ue, 
ther Пашу. То Бе тоге ргесіѕе, if є is ا‎ evel} и 
G(x) u Positive value ô such that |S(F;D) — S(G; ама b 


(x) s z T 
T OD. ing decision T 
сазовар I.I? Such a problem, the follow (x; Xu...» IC); D] 


r 
So. s 
This nable: Choose the value of D that minim K le" It seems 
тазыл ной rule will be called the “empirical high probability 
m € be i d that W 
ах [Hj х; ни x т foal ill = small, and therefore by the 


3 I ion us ill be c о S(F; D) 
j W. lose t (F; 
| Just made, S[H(x; Дур ез Aa): D] y ill b 
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ea А t since 
forall D. The best decision is the one that minimizes т 
F(x)is not known, we use H(x; Х,,..., X,,) in its place as : 

: ыс Pun 
5h conclude this section with an example oae rte how 
important class of problems. A newspaper vendor Either He may 
many copies of a monthly magazine to buy from the pua =, integer. He 
buy any number from 0 to C, where C is a given розм tr dollars pet 
pays the publisher w dollars per copy, sells to customers Е х dollars pet 
copy, and returns unsold copies to the publisher, pe ens d the number 
unsold copy returned. We assume у < W <r. E month, àn 
of magazines that will be requested during the E dift the т 
Ay X2,..., Xm denote the numbers that were requested | dependent 
preceding months. We assume that X,,..., Xm Y are E denote 
chance variables with a common but unknown cdf F(x). nnegative 
P(X = x) by f(x), where of course f(x) = 0 unless x is a no 


he 
Р fromt 
integer. D denotes the number of copies the vendor will order 
publisher. Then the loss function is given as follows: 
W(Y;D) = wD —ry— sp Y) FFB 
W(Y;D) = wD рр if YD 
This simplifies to 
W(Y;D) = (w — sp + (s =) FFD 
W(Y;D) = (w — r)D if YD 
Then we find 
D В "T 
S:D) = X tw — sp + (5 — MW) + У w—r)dfO) 
u-0 у= Ј)+1 
D c D)] 
7v — 8)DF(D) +. (s — п) 9 f(y) (w юр = А ] 
y-0 1). 
=(w—r)D 


D (ү — 
t (r— s)DF(D) + (s — Ay y[Fg) = PU 
у= 


$ only 
From this last expression, it is easily seen that S(F;D) po е i 
slightly if F(x) changes slightly. Therefore it seems reasonable 
empirical decision rule for this problem. 

In order to find which decision is chosen by the empirical pone 
We Investigate the shape of the function S(F;D). Comp 
difference S(F; р +. 1) = S(F;D), we get 


MES D + 1) — step 


е, 
„sonit 
ision the 


202) 
gam 


~W— P+ (r — әр + 1)F(D 4 DI 


j^ 
tís—r(n4 D[FCD + 
BM xe r)F(D) 
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From the last expression, it is easily seen that S(F; D + 1) — S(F;D) is 
negative if F(D) < (r — w)/(r — s), is zero if F(D) = (r — wl — s). 
andis Positive if F(D) > (r — w)/(r — s). This means that the function 
S(F;D) is minimized by setting D = Q +1, where Q is the largest 
Positive integer such that F( О) < (r — m — 5). Since the empirical 
decision rule acts as though the unknown F(x) were exactly equal to the 
nown empirical cdf Р(х; X;,...,X,), and since the largest possible 
ш 15 equal to C, the empirical decision rule chooses a decision as 
$u К у D = min (C, Q' + 1), where Q' is the largest positive integer 
Ch that Н(О', y... Xm) < (r — w)fr — s). 


mu tbe Empirical Decision Rule and Bayes Decision wes І = 
ecisio cussion in this chapter, we have not yet mentioned a missi 
n this : rules or Bayes decision rules, a fact which should seem puzzling. 
In ure we attempt to explain this fact. | — 
istribu -waale discussion preceding this chapter, the rar IL 
lions Поп was either one of a given finite number of possi ies ia 
Variation else one of a given family of distributions generate ду 
ayes Of a finite number of parameters. In such cases, m дешн 
а Ку dean rules and knew that each admissible decision ru sce 
in the decision rule or else a limit of Bayes decision "m. d 
İS one Pm chapter, we did not assume that the unknown dis Е Min 
family we finite number of possible distributions, E шо aa d 
Paramet 'stributions generated by the variation ct a ues eo ke 
üllowi ers. With such a wide variety of possible distributi К ens 
"Ng in the present chapter, it is not known whether each admis 


Ccisio i дый. ision rules. 
i n rule is a Bayes decision rule or a limit of Bayes decisio 


Ince th i not 
„Ле standar e i issible decision rules may n 
Work in th; d method for finding adm a certain intuitive 


Р n this c: iie ; л 
appeal, IS case, we employ a decision rule which ha 


Chapter 9 


RY 
CONVENTIONAL STATISTICAL THEO 


tandard 
9.1. Introduction. In Sec. 5.3 we pointed out that = h D, and 
formulation of statistical problems has the loss шеш problem. 
X,..., X, and that Y,..., Y, are not mentione ES "ed admis 
All the techniques that we have developed for finding ын БОЙЛО simple 
sible decision rules apply to the standard formulation, os changes і 
changes in notation have been made. We outline 

notation. 


Я X) 

we W(0;Ds* 
The loss function in the standard problem is written as 
since Y}... ; Y, do not 


When the joint distribution Ob eas X e dn otes the 
us to list the possible values of the chance variables, f(x; si . 


nd 
: blem а 
nite number L of Possible decisions in our pro 
-s Xm can be listed, then 
L 
"(05)) => Y WO; D:x) fGx 9) D:x) 
z D-1 


If X,..., X m have a joint pdf, then 


© oo L ETT m 
r(0;s) = [ -f ў W(0; Dx) f(x30)s(D;x) dx, Ў 


~o р=1 
Тһе quantity K( p; 
h 
2, b(0) WO; D fg 
0= 


decision rule 
Sec. 5.9. 


e 
toas 
р resen 
) introduced in Sec, 5.9 becomes in the A а Вау 
оп O à 
) and is used in the construction 


sin 
way ? 
relative to 6(1), ... , b(h) in exactly the same 
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The ity Ki i 
quantity K(D;x) introduced in Sec. 5.10 becomes 


: f b(0)W(0; Dx) f(x;0) d0 


andi > 
ре exactly the same way as in Sec. 5.10. 
Problem of ihe this section, we describe a method for turning a decision 
. ay ч type discussed in Chap. 5 (where the loss depended оп 
the loss дере © a problem of the type discussed in this chapter (where 
only for its nds оп 0 and not on Y,,..., Y,). We carry out the details 
Сап be listed pue where the possible values of Xy, .. -> Xm Yas++ +> Y; 
cases can be and there is a finite number L of possible decisions. Other 
€ handled similarly. From Sec. 5.6, 


L 
r(0;s) -XX X тор), у;б)з(Р;х) 


We de 

note У f(x y:0) b "E "T A 
distribution ^. к 0) by /(х;й) and note that /(х;0) is the marginal 
denote Or X1, ..., Xm given by the distribution f(x,y:0). Also, 


EWQiD2)M(G.y:0) 


by vi 
0: D-y х;0 
( 3D;x), Then we Hayê f(x ) 
E 
B r(0;)) —YX Y WO; Dx) 70:005005) 
2Ut this is th т D=1 - 
(0: Dy € expected loss for a problem in which the loss function 15 
JG) as SUR the possible distributions for Xs. . - - » X,, are given by 
Chapter, WE But this is a problem of the type discussed in this 
еге Y... , Y, do not appear. 


2. Test 
Statistical pret a Hypothesis. A very comm 
D d P oblem, known as the “problem of testing 
ere “scribed. 
a ] À 
rues of aed possible decisions, which we label as 1, 2. The pe 
Thy ге broken into three nonoverlapping groups: ie ee = 


те ne loss does not depend on the values of Xp- 
eloss function as W(0;D). The value of W(0;D) depends on 


yt : 
table. hrough the group in which 0 falls, and is given by the following 
din 
I ar Hi 


on type of conventional 
a hypothesis," will 
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Clearly, if 0 is in group I, we should like to choose decision 1; if 0 is in 
group П, we should like to choose decision 2; if is in group Ш, it does 
not matter which decision we choose. As a matter of terminology, 
choosing decision 1 is called "accepting the hypothesis that 0 is in group 
I" and choosing decision 2 is called "rejecting the hypothesis that Oisin 
group Т.” 

Itis easily seen that r(0;s) has the following properties: if 0 is in group 1, 
r(0;s) = P(decision 2 chosen by decision rule s); if 0 is in group IL 
Ee — P(decision 1 chosen by decision rule s); if 0 is in group ш, 
r(0:s) = 0. 


We show this for the case where the possible values of Жу. хай 
be listed. Then 


№05) = У 5 W(0:D)f(x;0)s( Р;х) 
т D=1 
Then if 0 is in group 1, W(0:1) = 0, W(0;2) = 1, so we have 
(938) => f(x30)s(2:x) 


and it is easily seen that this last sum represents the probability i 
decision 2 will be chosen when the decision rule s isused. The demo" 
Strations for 0 in group II and for û in group Ш are entirely similar. 
Usually there is an additional aspect to the problem of testing = 
hypothesis, There is a Preassigned value «(0 < ж < 1), and we for 
limited to the use of a decision rule s with the property that r(0:5) = 20 
The MID, 2 is known as the “level of significare 
all es satisfying thi i i ikee 
decision rule which tides ins r praetor ind 
T . Dini 
| es Is called а “minimax test of level of significance a.” 
Problems where there Isa finite number of distributions in bot! 


1 Е ach 
a Minimax test of level of стол struct? 

of an be cor s 

by the use of linear programming significance ж can jem i 


Ў 8. The linear pro ramming pro all 
ا‎ ibm, he unknowns are s( D;x) E j^ 1, 2 and ү 
ues of x ang ê r(0;s). The equalities and inequalit 


bin 


jes ate 


ыл э ы 5(2;х) = 1 
У Лх;0)502;5) <a 


> flo 


for each x 
for each 0 in I 


Js ix) — тах (05) — 9 


Vin Il for each 0 in П 
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and it is desi 
esired t 

max 7(O- о find th 

max (958), е values of the unknowns which minimize 


Аза numeri ” 
ly one Forni a we take a case where m = 1, group I consi 
Istribution in gro "X and group II consists of two demo TN 
ci The ат 18 the binomial distribution with Ж. 
paranne 3, 0.3, and TT group П are the binomial distributions with 
А difficult кшш .5, respectively. The value of «is 0.10. It is 
х Б up I, the inequali any problem where there is only one distribution 
Placed by the = ality r(0;s) < « for the single 0 in group I may be 
a $0 because rl e r(0;s) = x for the single 0 in group I. 
7 to make r(0:s) n Te below « for the 0 in group I, we could raise 
equalities Increase a fats e func d wp ни ama 
es for the ресе n Thenourlinear programming 


s(1:0) + 52:0) = 1 
s(131) + 502310) = 1 
s(1:2) + 52:2) = 1 
" s(1;3) + 50243) = 1 
0343 à ins + 0.432s(2;1) + 0.288s(2:2) + 0.06452;3) = 0.10 
E O.4415(1 1) + 0,189s(1:2) + 0027,(1:3) + 2, — max (0) = 0 


9125s 0) 
z, — max r(0;s) = 0 


+ 0.37551. 
Wher S(1:1) + 0.375s(1;2) + 0.1255(1;3) + = | 
= е а, - - (in П 
2 are nonnegative slack variables. As a start, we set 


the fr, ЗО 
Bis D) = | ээ, 
1,5(2:3) = 1, s(2:2) = 0.125 (this last value is to make 
ey = GIDL. 


able; 
Cau is then vin i 
5(2:0) 5(2;1) (1:3) a 
5(1:0) 
80:1) 
5(2;3) 
1:2) 0.875 
5(2;2) 0.125 
E 0.121 


Max pp. 
uU) | 0949 


Ü in 
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Raising s(2;0), we get as our second tableau 


5(2;2 


5(2;1) 


s(1 33) 


%1 « 

5(2;0) 0.167 —1.333 —2 0.296 0 
5(1;0) 0.833 1.333 2 —0.296 0 
5(1;1) 1 0 -1 0 0 
5(2;3) 1 0 0 —1 0 
5(1;2) 1 zd 0 0 0 
Zo 0.062 0.477 — 0.369 — 0.163 1 
max r(0;s) 0.916 0.268 0.245 —0.074 1 


Raising s(1;3), we get as our third tableau 


eae 


2% 5(2;2) 5(2;1) 21 
5(1;3) 0.381 —6.13 2.93 —2.26 6.13 
5(2;0) 0.279 —1.816 0.466 —2.67 1.816 
s(1;0) 0.721 1.816 0.466 2.67 —1.816 
s(1;1) 1 0 0 -1 0 
5(2;3) 0.619 6.31 —2.93 2.26 —6.13 
5(1;2) 1 0 -1 0 0 
фа r(0;s) 0.888 0.454 0.051 0.412 0.546 
in 


The fact that the last row of this third tableau contains only positive 
entries tells us that it represents a solution to our problem. Thus 2 
minimax test of level of significance 0.1 for this problem is given by setting 
5(1;0) = 0.721, s(1;1) = 1,5(1;2) = 1, s(1;3) = 0.381. 


9.3. Testing a One-sided Hypothesis. A very common special type 
of hypothesis testing problem is where the possible joint distributions are 
given by the variation of a single parameter, denoted by 0, and group Т 
consists of all distributions given by values of 0 less than or equal to 4> 
group II consists of all distributions given by values of 0 greater than ОГ 
equal to B, and group Ш consists of all distributions given by values cid 
between A and B. Here A and Bare given constants with А < B. This 


CONVENTIONAL STATISTICAL THEORY 141 


(if Z has a pdf), or denote P(Z = z) when the parameter equals 0 if the 
possible values of Z can be listed. For each and every pair of values 
91, O, with 0, < 05, the ratio &(2392)/g(z;9,) increases as z increases. 

We are going to show that for any problem of testing a one-sided 
hypothesis in which the conditions of the preceding paragraph are 
Satisfied, the following decision rule s is a minimax test of level of 
Significance ж: Choose D = 1 if Z < c, choose D = 2 if Z > c, assign 
probability p to choosing D = | if Z = c, where c and p are quantities 
Chosen to make r(A4;s) = «ж. [If Z is a continuous chance variable, 
P(Z = с) = 0, so the decision rule s is simplified by the elimination of p.] 

Before proving that s is minimax, we list several examples where our 
conditions hold: 

e Masses X,, are independent, each with a binomial distribution 
With the same parameters л, б, where n is a known positive integer, and 0 
55 an unknown quantity between O and 1. Here we have 


! ! n! 
ae ы йуз a. uuu 


ху! (n — ху)! xal (n — xa)! Xm! (1 — x)! 
x meth 1 Gut sem) 


Then We see that Z = X, + °°° + Xm is sufficient for the decision 
Problem, 2 has a binomial distribution with parameters nm, 0, so 


! nm-z 
2(2;0) = (nm)! - 0(1 NL 
Then z!(nm — z)! 
&(z:0,) (5) ( ls sp. ( = M) 
82:9) Vy M—0/ ~~ \1—-0/  M,— 66; 
and since 0. > 0, it is easily seen that this last expression increases as Z 
Increases, Me 
With Xs Хь..., X,, are independent, each with a normal pena 
‘th the same mean 0 and standard deviation o, о being known, 


Unknown, Then 


( 1 Jew i^ - 0) 

ЈО... 0) = o 

Denoti А cm; 0) can be 
ting (x dereud " tf(Xy Nhi 

Written 2 (к + + x,,)/m by z, we find tha fea 


2 2 22-0)? 
( г a2 3Se; 2 g- "18070870 
GJ 27. 


T А : ient for the 
Эш this, we see that Z= (A beer Х„)т is suffic 
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decision problem. Z has a normal distribution with parameters 0, 
o| Vm, so 


т /20°)(2—-0)? 
=” N = (т/207)(2— 0) 
g(z;0) = Ye 
027 


Then 


(2505) = e~(m/20°\(z- 02)" + (m/2a°)(2- 01)? = م‎ (120° (0-02) (m/a* (0 - 0 
8(2:0,) 
and since 0, > бу, it is easily seen that this last expression increases as 2 
increases. TNT. 
S. Moda sse independent, each with a Poisson distribution 
with the same parameter 0. Then 
g^ mop) zy tee tem 


Жу, tse В) 


Key! xl б! 

and we see that Z = Xi + °°° XY, is sufficient for the decision 
problem. Z has a Poisson distribution with parameter m0, so g(239) "i 
e^" (m):[z!. Then &(:09)/g(2:0,) = e- "2+ (0,/0,)F, and since 0s > 0, 
it is easily seen that this last expression increases as z increases. ' 

Now we turn to the proof that the decision rule s described above 1$ 
minimax. For simplicity, we carry out the details only for the case 
where Z has a pdf. Let G(z;0) denote the cdf for Z when the parameter 
is equal to 0. First we show that G(z;0) does not increase as 0 increases, 
for any fixed z, Suppose the contrary. Then there would be values 
Z, б, 03, with 0, < Os, such that (25) < G(z;0,. There are two 
possible cases: 

1. #(2;б,) > g(z305) 

2. 8(2;0) < 2(2;0,) | 
In case 1, g(w;0,) < g(w;0,) forall w < 2, because g(z;0,)/g(z;0,) increases 
as z increases. But then 


Г ai) dw < | 800) 40 or G(z;0,) < G(z:0) 


" H t 2 
contradicting our assumption that G(z;0,) < G(z;0,. In case ^ 
80:05) > g(w;0,) for all w > 2, and then 


| £(w;0,) dw >| gwih)dw or 1. G(z;0,) > 1 — G(z;0)) 
or — G(z;0,) > б(с;%) 
contradicting our assumption that G(z;0,) > G(z;0,). Since the адашар 
tion that 0(2;0,) > G(z;0,) leads to a contradiction, we have shown tha 
G(z;0) does not increase as 0 increases, for any fixed -. s 
Our next step in showing that s is minimax is to construct a drm 
decision rule s, relative to the д Priori distribution B(0) which assign 
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probability b to the point 0 = А and probability 1 — b to the point 
0 = B, thus assigning zero probability to all other values of 0. Using the 
loss function given in Sec. 9.2, we find K(1;z) = (1 — b)g(z;B), К(2;2) = 
bg(z:4). Then the Bayes decision rule chooses D = 1 if К(1;2) < 
K(2;z) and chooses D = 2 if К(1;2) > K(2:z). This is equivalent to 
Saying that the Bayes decision гше s, chooses D = 1 if g(:B)/g(z;4) < 
b/(1 — b) and chooses D = 2 if g(z:B)/g(z:4) > b/(1 — Б). Since we 
are assuming that e(z;B)/g(z;4) increases as z increases, this is the same 
as saying that s, chooses D = 1 if Z < (Б) and chooses D = 2 if 
С > c(b), where c(b) is a quantity depending only on b. Clearly, c(b) 
increases as b increases. We have r(A i$) = 1 — G(c(b);A), and we can 
find a value of b, say, Б”, such that | — G(c(b');4) = a. Then we show 
that Sy is a minimax test of level of significance х, as follows. For any 0 
In group I, r(0;s,) = 1 — G(c(b’):0). Since A is the largest value of 0 in 
group I, we have G(c(b');0) > Gle(b'):A), or 1 — Gle(b');) < 1 — 
G(c(b^);4) = a for any 0 in group I. Thus sy does have level of signifi- 
cance g, For any 0 in group Il, r(0;s,) = G(c(b^);0). Since Bis the 
еи value of Û in group II, r(03s,) < r(B3sy) for апу 0 in group П; 
at is, < 
r(Bisy) = тах г(0;зь) 


Now Suppose that s, were not a minimax test of level of significance j 
Then there would be a decision rule г, with r(4:r) < « and r(870) Fs 
"(Bis But we must have b'r(4;s,) + (1 — )r(Bis) ea з 
= Б) В;), and this could happen only if b' = 1. Butt ; = "s 
Would always choose D = 1, and therefore « would be equa toz "i 

© are assuming that « is positive, and therefore s, is minimax. cn 

€ same decision rule as the decision rule s described earlier in 
Section. 


oy, A, Au are 
As a numerical example, suppose m = 4 and X, Xs X3, Ла 


i 5 ; eviation 
e dependent, each with a normal distribution with eer T De- 
qual to | and unknown mean 0. 4 = 2, B = 6, an | 


fini inimax test of 
n { ve know that the m 
&Zas (X, + X, + X, + X,)/4, we kno еге c is chosen to 


Evel Of signi ZZ СМ : 
nificance 0.1 chooses D = 1 when Z < € qc M th 
ae r(2;s) = 0.1. But when 0 = 2, Z has a normal game е т 
|. p "s 
ean equal to 2 and standard deviation equal to 15, Thus c sa 
uation 


5 © 2 

2 | ¢ dz = 0.1 
2 Li 
v Tc 


Making the tr: z — 2)/0.5, we get 


ansformation v = ( 


1 i оу = 01 
„ут Je 2)/0.5 
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Table | in the Appendix shows that (с — 2)/0.5 must be equal to 1.28, 


so that с = 2.64. Note that the value of c would not be changed by a 
change in В. ы 


9.4. Testing a Two-sided Hypothesis. Another common special type 
of hypothesis testing problem is where the possible joint distributions 
are given by the variation of a single parameter, denoted by б, and 
group I consists of the single distribution given by 0 = A, group II 
consists of all the distributions given by values of 0 less than or equal to В, 
plus all the distributions given by values of 0 greater than or equal to В», 
while group III consists of all distributions given by values of 0 between 
B, and B, excluding the value А. Here A, Bı, B, are given values with 
В, < А < B, This problem is called the "problem of testing a two- 
sided hypothesis." : 

In many important cases, if we construct a decision rule s which ш 
Bayes relative to the a priori distribution B(0) which assigns probability 
b, to the pointó = В, probability b, to the point 0 = B,, and probability 
1 — b, — b, to the point 0 = А, where b, and b, are chosen so that 
(Bs) = r(By3s) and r(4;5) = %, we find that s is a minimax test. We 
illustrate this for the case Where X. X5... svs Xm are independent, and 
each has a normal distribution with known standard deviation с an 
unknown mean 0. We know that Z = (Шт), + +++ + X4) is бш, 
cient for this problem, and that Z has a normal distribution with standar 


deviation o/Vm and mean 0. Using the a priori distribution B(0) 
described, we find 


m 2 2 m = Ba) 
K(1;z) = b, Ут за 02 Ву) 3 p, Мт. a= (mite == Ва) 


[NP c. /2n 


K(2;z) = (1— b, — bs) Vm е Qno!) - AY* 
0/27 


A detailed but straightforward investigation of the ratio K(1 KO 

shows that for any given values ©, €» With су < cy, there are values of bi, Ё 
me eG, be > wp = em that. K(1;2)/K(2:2) = i 
< —z < c, and K(l 2)/K(2;2) > 1 if z < cı or if z > с. Then п r 
decision rule 5(су,с) which chooses D — life, < Z <c,and = 7 = 
other values of Zis Bayes relative to B(0) for properly chosen Ву, jy ns 

bi > 0, bs = 0,1 — Баба > 0, Bor any 0 in group II, 


5 Ут ca 2 2 
(05566,6,) = т f p-o- gy 
e 


o /2z 


> 0 
We now show that this last expression, as a function of 0, increases 45 
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increases from — co to (14)(c, + с) and decreases as 0 increases from 
(24)(c, + сь) to оо. This is shown by the fact that 


2 °F бы йе. — 0) ЖЫ ТК HÀ 
OEM еар ا‎ tee 
90 Ja s di 


1 


= 1 [e7 "mta cover — g^ OI - oye] 


and this derivative is zero for 0 = (24)(c, + ca), positive for 0 < (2), + 
сз), and negative for 0 > (14)(c¢ + ca). Next let cy’, сь be the values such 
that (д, 35(¢,',c2’)) = (Въ ;5(с1',с2)) and r(A ;5(су,с;')) = =. Such values 
Су’, су always exist. Now we can show that s(c',c,') is a minimax test 
of level of significance «. First, it is clear that 


"(By 38(Cy's¢2')) = г(В„;5(сү›с,')) = max г(0;5(су/,с„/)) 
because of the shape of r(0;s(c1',cx')) that we established above. Now 
Suppose that s(c,',c,) is not minimax. Then there would be a decision 
rule ¢ with 

(Аз) < a, (Ву) < r(By3s(ex',c2')), and 
We know that s(c,',cs) is a Bayes decision rule relative to B(0) for 
properly chosen values b, b», with b, > 0, b; > 0, 1 — b, — b, > 0. 
Then we must have 
byr(Byss(ey ss) + bar(Bass(ey cs) + (1 — by — БӘ)'(А;5(сү,с,/)) 

< byr(By;t) + (В) + (1 — bı — ba)r(4;1) 


ion that r(A;1) < «e, (Ву) < 
The contradiction proves that 


r(Ba;1) < r(Ba;s(c1',ca')) 


This inequality contradicts the assumpt 
"(By (сус), кВ) < r(Basstes'scs)- 
8(¢y',¢9') is minimax. | 
Before taking up a numerical example, we note that с’, с will be 
symmetrically placed around В, В; that is, с = (%)(В, + Ba) — d, 
Cy = (М)(В, + B;) + d, where d is some positive value chosen to make 
"(Ais(c,c,)) = а. Suppose we set m = 16, с = 2, А =3, В, =2, 
2 = 5, « = 


Then сү = 3.5 


= d, с’ = 3.5 + d, where d must be chosen so that 


- feit a 4 (= 92-3)" qz = 0.05 
r(3;s(c,,c5)) = 1 — N [ e 
Making the transformation y = 2(2 — 3), we find that d satisfies the 
equation 
1 1+24 z 
— e #1) dy = 0.95 
[2r J1-20 
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By trial and error in Table 1 in the Appendix, we find that d is approxi- 
mately 1.32. Thus с = 2.18, c, = 4.82. 


9.5. Point Estimation. Another very common type of eroe ри 
statistical problem is the problem of "point estimation," which will be 
described in this section. | a — 

The possible joint distributions are given by the variation of a $ io 
parameter 0 over a given.interval, which may bean infinite interval. of 
possible decisions are the possible values of 0. Standard тре м 
the problem of point estimation make it clear that we shone Ке, : 
decision to be close to the true value of 0, but usually do not specify a j^^ 
function. However, the following type of loss function seems to 
Suitable, in the light of most discussions of point estimation: 


W(0;D) = c(0X D — 0} 


Where c(0) is a function of 0 which is never negative. | h 

With the type of loss function introduced in the preceding paragrap v 
Suppose we want to construct a Bayes decision rule relative to an a pri 
distribution B(0) with pdf b(0). Then 


K(D;x) = | b(0)c(0) D — O)2f(x,, ... , х; 0) d0 
Expanding (D — 0)2, we find 
KD) = D? [ буе... 0) di 
=Й Joss, se naas 0) idi 


F [eoe DEREN] 


The Bayes decision rule chooses the value of D that minimizes KCN 
and by solving the equation (9/9 D)K(D;x) = 0, it is easily found 
the minimizing value of D is 


[mores I mr 


COO 


— 
As an example of the computation described in the preceding P à 
&raph, we take the case where Hv... , Xn are independent, each W 
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И distribution with the same parameters: known standard deviation 
3 nd unknown mean 0. c(0) = 1, so that W(0:D) = (D — 0}. 

'Ppose we want to construct a Bayes decision rule s, relative to the a 
Priori distribution with pdf b,(0) = (1/rv 2z)e-^?^?. Defining Z as 
my, + +++ Y ), we know that Z is sufficient for the decision 


Am 


Ver ри Z has a normal distribution with standard deviation c/V/m 

а i Z is (V mov 2s | , 

иш mean 0, so the pdf for Z is (y т/с\ 27) exp [—(т/20°)(= — 0}]. 

р еп, using the formula given in the preceding paragraph, we find that s, 
Оо$е$ the decision 


= d gt) in ' 3 
= exp | - 0 | x exp ( =e oy) ao 


гү/2т‏ ل 


CNN -az ; 
= exp ( а NL exp ( = (z y) 40 


vow Dy 2r 21 б, 5n 2g? | 


After с; ; : 
hats canceling common constant factors in numerator and denom- 
alor, combinin exponents, and making the change of variable и = 


ON The : : 
Ws m[c*, we can write the decision chosen by s, as 


g 


пе Pet" du 


)9.1( و ل 
Ка } EL | enhet" du‏ 
v 7° 2g J-a‏ 
nf 2 с‏ 


Where д denotes 


But the; ; 
the integral in the denominator of (9.1) has been evaluated in Sec. 4.6, 


ere we found 


ш ve Coe 
e "Pel du = e 


qual to the derivative with 
f (9. |) and is thus equal to 
on (9.1) is equal to 


Iso . | 
тезро ће Integral in the numerator of (9.1) Is © 
(daa К А of the integral in the denominator 0" 
Jeca? — Aeta? Therefore the expressi 


A == 


F > ohm 
vI + m/o [л 
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Thus the decision chosen by the decision rule s, is 


mZ (9.2) 
c^? + m 
mz x (ze —0)— cwn 
r(0;s,) z| (- m E 0) | E т + ED | 
m* E((Z — 0*) + 0? 


a (m + Ph? (1 + m?l?’ 


тоо? + 0 


ч (1 + mv?/o?)? m 
9.6. An Admissible Decision Rule Which Is a Limit of Bayes Decisio 


: ме 
Rules. Continuing our discussion of the example of Sec. 9.5, 
note that 


lim r(0;s,) = Č 
vo m А il 
Denoting by s the decision rule which chooses the decision Z, it 1s ean 
verified that r(0;s) = c*[m. Therefore the decision rule s is the lim ind 
the decision rules s, as v increases, in the sense of Sec. 5.10. Wearego 
to show that s is a minimax and admissible decision rule. паб 
First we note that if s is admissible, it must be minimax. For if sis 
minimax, there would be a decision rule ¢ with 
max r(0;t) < max r(0;s) = = 
0 0 m be 
and therefore (буг) < o*|m = r(0;s) for all 0, so that s would ДОР 
admissible. Therefore we have only to show that s is admissible, 
this will also show that s is minimax. :< 616 
In order to show that s is admissible, we assume that it is not admiss et 
and force a contradiction. Ifs is not admissible, there is a decision n | 
with r(0;7) < (0 38) for all 0, апа 1(0;t) < r(0;s) for at least one value oa 
But since r(6;t) and r(0;s) are both continuous functions of 0 in the P that 
problem, there must be values А, B, ^, with B > Aand A > 0, suc 


ch 
(0;1) < r(0;5) — A for all 0 between 4 and B. Then we have, for € 
v0, 


К В 
of [r(0;s) — +(6;t)]b,(6) do > of [«0:s) rj) 48 
29 у 
В 2/20 
> 16 1 е qp — áf IE ) d0 
A 


27 [27 Ja 
8 тыз А es. 
and this last expression approaches A(B — A)/V 2a as v increas 
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© 


Therefore, as v increases, of [r(0;s) — r(0;1)]b,(0) d0 remains no smaller 


than Ф quantity very close to A(B — A)|V2z, which is positive. It is 
easily verified that 


© = 
‚| ross) — 0) 0 = a 


Which approaches zero as v increases. But 


2 [r(055,) — r(0;1)]b,(0) 40 = |" [r(0;s,) — r(0;s)]b,(0) 40 
p Í е [r(0;s) — (0:1]b,(0) 40 


and therefore as v increases, v ° [r(0;s,) = r(0;1)]b,(0) d0 becomes ae 


This means that a value 0? can be found such that v' | [И = 
"(0;0)]Ь,(0) 40 is positive, which implies that 


C r(0;s,.) b, (0) 40 >f 1(05t)by(9) 40 
" ction, by the definition of sy as a 


B s : a ê i ; 
ut this last inequality is a contradi ie contradiction proves that sis 


ayes decision rule relative to B,(0). 
admissible, —" 
C <s Хы; 0) can 
9.7. Estimation of Location Parameters. If A» den Ais called a 
Written in the form g(x — 0, х= = 0,. $$ poe If we define the 
9cation parameter." The reason is as is MET T 
Chance variables Voy oos, ы, DY da = AGE a itis easily seen that the 
^» + €, where c is an arbitrary constant, then »» су 60 0: 
Joint pdf for y. Y,, is g(a — 9 — був“. ^ot that 0 has been 
‚+++, Pm d8 SU t that 0 has 
But this is the same as the pdf for Xn...» Xm wn variable (that is, 
„Cased by e. Thus the addition of a ponai same constant to 0 in 
“change of location”) has the effect of adding the 5 


the joint pdf. mate 0, and suppose that W(0;D) = 


12 decision 
— P. Tfk(X. X) denotes the decision мете г Бане СГ 
Tule When X, 7X, are observed, it seems rat КОХ, Te 
discussion in the preceding paragrap S 


due 
of Xi- =s Xm an 
nd = | aa o p Aa) H e for all Lpa to limit attention to 
For the remainder of this section, we are going 


et “invariance 
а, is called an “1 

Cision rules which satisfy this condition, which 

dition.” 
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If k(Xq,..., Xm) is the decision chosen by the decision гше s E 
K(X + 6,..., X, + с) = k(X%,...,X,) +c forall values Xy... X» 
and c, then 


"s | | Ч 
09 =] [е ра) 0а — 0, зна xy — 0) йух 


=f z [| [k(y jg, — OVP 


X g(x — 0,...,х„— 0) dxi Xm 


; -— —Ё 
In this last integral, if we make the change of variables y, = Xi d 
Js = Xy — 0,..., Ym = X, — 0, we get 


(бз) =] Е [pF СО, <<< р) BOs yu dy °° Ун 


А -— dise not 
and since this integral does not depend on 0, we see that r(0;s) ary 
depend on 0, if s satisfies the invariance condition. Clearly, we 5 

like to choose Ks...» » Ym) to minimize the integral 


MUN CMT CDS 


By the invariance condition, k(y, — Ут «+ nua Дш Jn 7 J ч) = 
Riner ag Yu-t Yu) — y». Therefore we have Kus o Йе»; Jii 


\ » ‘on into the 
Jn FRY — yu... »Jn-1 — Ym; 0). Putting this expression into 
integral to be minimized, we get 


[| E | Dn Ks — у... Yi — Yu Og, <<<, Vm) Фә ЧУт 


2 . wS: 
In this integral, we change to the variables r,, . . . , fm defined as n 
hy = Dy — Yang f, = Yo — Yms « «5 f, sls 


і =й т-1 = умр Vins tm = Pins 
integral to be minimized becomes 


I e [t+ Ке sts Py city O)P ett, +4 


It is easily seen that this 1 
set of values f, . . 
minimize 


Г ) dt, 207 dtm 


lini v onis g fs fin 


А à "V" s; iven 
ast integral will be minimized if, for each £i 


as 10 
* > Ím- We Set the value of К(,...‚ 7, 1, 0) 50а 


i Ln + Ks... OPen + 1, 1) 


тэ ә m-i F bins 
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Differentiating with respect to k(t, fa, . . - 5 tnas 0) and setting the result 


ê M, 
qual to zero, we find that the minimizing value of K(f, . . . , tna, 0) is 
sin 
li imli t tas а-а F биз ty) dia 
(9.3) 


o 
| gh F tm +++» naa bs Da) ds 
Jw 


Ine i i i 

ree particular problem, (9.3) is computed, and then if Xj, ..., Xm are 

i o served values, the decision chosen by the best decision rule satisfying 
E invarianee condition is х, + KO — Хш... s ma — Xm 0). 

m is ore discussing examples, we note that in the special case when 
= 1, the expression (9.3) becomes 


| i tg(t) dt 
LL (9.4) 


F g(t) dt 


-© 


As a first example, suppose /(Х,..., Хы; 0) is given by 
1 fg [ 1 2 J 
— x x,— 0) 
m. = 20° pu ) 


a a is known. Defining Z as (1/т)(Х, +++ + Xm), we know that 
is sufficient for the decision problem. The pdf for Z is 


arameter in the distribution for Z. 


and А " 
nd we note that 0 remains a location р 
able Z, we use formula 


Sinc Р Ж : : 
€ we are basing our decision on the single vari 


(9.4) above, getting 
| н „АЙ м, (— E dt 


ED сү 27 o“ sü 
in m т o» 
| Vm exp — | dt 
-о OW 27 20° 
ance condition is to 
Zisz. This decision 


f03,...,x,:0) = Tif 16 cx, —0 <% 


and — 15 + % 
%<x, —0 <4, апа f(x +-> Хт 


Am 


152 STATISTICAL DECISION THEORY 


foranyi. Thus in computing (9.3), the integrands are zero unless all the 
following inequalities hold: 


—M «t -t,«Y 
-H «X ty t t, « V 


-H dg o, =н 
“Kite = 


Denoting by L, the largest of the quantities —M — tp ا‎ = fay + 2 
—%— 9н и, "E by Ly che аа of the quantities 8 б 
Ya ty...,¥%— tn 28, we see that the inequalities abov Ae 
equivalent to Lı < t,, < Ly. Furthermore, when L, < tm < Ls 8 6.3) 
Im ++ +5 Гаа + fa, tm) = 1. Therefore in the present problem 


becomes 
1% 
] Ls dtu, 
UR = 80, + L) 
dtm 
Li 


" " S invariance 
and the decision chosen by the best decision rule satisfying the invar: 
condition is 
14) 
х» — Gd[min G4 — ху + x, Jé — x, F x... хас + Xm 2 
—М 
nds m LIC Co MM M 
Since 1 
min Q4 — x, + x4,34 — x, x... НЕГ 
is equal to 
4% + x, — max йуз +» + 
апа 
E 
тах (—L1$ — x, + Xn, =M — xy F xus... s = ووت‎ T Xe ы 
is equal to 
= + Xm — min Bis C NET 
the decision chosen is ] 
: хы). 
e ОИС ызы m... елу — M ee min (фы: BA 
which is equal to 


Gd) [max (x, . .. , x) + min (xq, <., x,)] 
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Б sd section we have limited our attention to decision rules satisfying 
a ЕЕ condition and have found the best such decision rule. 
ed eb ees is there that this best invariant rule is admissible when 
anpes з iis all possible decision rules? In the first of our two ex- 
And е new from other considerations that the rule was admissible. 
E à standard cases, a development like that in Sec. 9.6 can be 
ШЫ: E ow that the best invariant rule is admissible. If an invariant 
Ys missible, it is also minimax: this follows from the same reasoning 
at used in the second paragraph of Sec. 9.6. 


9.8. Estimation of Scale Parameters. If X;,..., X, are chance 


a which are nonnegative with probability І, and LEP 3e a O) 
"x EORUM in the form (1/0")g(33/0, . . . + x/0), where 0 is positive, 
ДЕП is called a ‘scale parameter.” The reason is as follows. If we 
: ne the chance variables Y, ..., Y, by Yı = С, Y» = СКЕ 5 
Еа CX m where c is an arbitrary positive constant, then it is easily seen 
m the joint pdf for Y, ..., Yn is [1c9)"]gQu/c9. . . - , 2/0). But 
is the same as the pdf for X, . . -> X,,, except that 0 has been multiplied 
@ e Thus the multiplication of each variable by a positive constant 
at is, a "change of scale”) has the effect of multiplying 0 by the same 
Constant in the joint pdf. 
Suppose the problem is to esti 
(1/02)\(D — 0)2, If K(X, ..., Хы 


mate 0, and suppose that W(0;D) = 
еб ) denotes the decision chosen by our 
og lon rule when X1 I Xa ate observed, it seems reasonable, 
€" of the discussion in the preceding paragraph, to demand that 
Nd ve sa Д) mm Xs s «a X n), for all positive values Xs РЕГ" Xm 
o de. For the remainder of this section, We are going to limit attention 
ecision rules which satisfy this invariance condition. 
du ran ..., X,,) is the decision chosen by a decision rule s satisfying 
nvariance condition, then 


r (05) dp Key s.s x)—8]21 {ха e) as дЕ 
o do 0 б" I aae T е е 
ГЕП ый le ae 
» this last integral, if we make the change of variables yy = 3/8... 

m= х„[0, we get 
r(0;5) = f ee Mace ..., Vm) — LP BO - . Ym) Фа Уһ 
Jo Jo 
and since this integral does not depend on 6, r(0;5) does not depend on б. 
ей we Кош uke to dinsase MU + - - , Jd O Dium the integral 
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| a I "у... In) — Шу. Ya) dc Br By the in- 
0 0 


variance condition, К(ууһ Узум» - - - , УУ» YulYs) = Uy Ki 
«s Ym- Y») Therefore we have йыл» vo a Pa) = Ym KYY w 
xs Me ons Д Putting this expression into the integral to be mini- 
mized, we get 


[| 25 | КЕ ELS ) — ао, s Ya) 77 dYm 
0 0 ` 


Ym Ym 
In this integral, we change to the variables hy... , 1, defined as follows: 


ti = Y Yms -+ + tna = Ym-ilYms tn = уһ. Then the integral to be 
minimized becomes 


p s -dt 
Гоа, ы) аы 
0 0 
It is easily seen that this last integral will be minimized if, for each cap 
set of values 4,..., t, ,, we set the value of k(t, ... , fm- 1) so ast 
minimize 
о 
| [t kt, eers 4-р 1) — IPS ОАА ба MES (ee dim 
0 
Differentiating with respect to k(t... , tm- 1) and setting the result 


equal to zero, we find that the minimizing value of k(t), ..., Ра» 1) i5 


© 
m 
[ Em" tus <<<, satus) dim (9.5) 


o 
m+) 
|, LC 


In any particular problem, (9.5) is computed, and then if xj, . .. , Xm ate 
the observed values, the decision chosen by the best decision rule satis- 
fying the invariance condition is Х„А(ху/х,,..., чх: 1), 

Before discussing examples, we note that in the special case when 
m = 1, the expression (9.5) becomes 


| tg(t) dt 
0 


Jo ^ (9.6) 
| f'g(r) dt 
0 


Our first example will violate one of Our assumptions, since the 
variables Х\,..., X,, will not have to be nonnegative. However, there 
will be a chance variable Z sufficient for the decision problem, and ZW 
have to be nonnegative, so in terms of Z our assumptions will be satisfie- 


CONVENTIONAL STATISTICAL THEORY 155 
rum г 26 ‚ Х„ are independent, each with a normal distribution 
uui a e known mean и and unknown standard 
Gass D) ( 1р [ a Уо | 
20° =1 


0./2т 


Then it is easily verified that Z = [Ea — y)? is sufficient for the 
1 


ы problem. From Sec. 4.8, we know that 22/0° has a chi-square 
Te а with m degrees of freedom. From this, using the technique 
i . ie 
coa: in Sec. 3.7, we find that the pdf for Z, g(2;0), say, is given as 

2(2;0) = 0 ifz «0 
سل‎ 1з ачина ifz >0 
2е"-®#Г(т[2) 0\0 
in the distribution for Z. Since we 
le variable Z, we use formula (9.6) 
nt factors) 


g(z;0) = 


2: ed that 0 is a scale parameter 
b asing our decision on the sing 
Ove, getting (after canceling common consta 


| (t 367 694) qt 
o (9.7) 


ر 


E we 
mte! € Сай 
0 


T 
9 evaluate (9.7), we note that we can evaluate 


e 


tre" dt by making 
t А Р А к ; 
ine change of variable w = (1), the integral then becoming 
geom [енеда dies ener (C4) 
2 
б 2 


Тһи$ (9.7) is equal to 


2m-Dp(m + 1/2] _ 1 T[(m + 1/2] 
PAP [im + 22] VAP + 202] 


as "n 4 | ча 
d the best invariant decision rule is to choose the decision 


Ja Г[(т + 1)/2]\ 
“l2 грот + 2)/2] 


w 

hen the observed value of Z is 2. . 

ean a second example, suppose Misses Йа OS all independent, each 

nee a uniform distribution between Oand 0. Thus f(%,-+-+%ms 0) = 
if O<x/0 <1, 0 < 5/0 < ПТР and 0 £ Xall < 1, апа 


F&a., хь 0) = Oif any of the quantities ~i» - - х„ is below zero or 


-s Am 
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in thi i Hass aa) 1s 

above 6. We note that in this problem, the function e(y;, . eb 
equal to 1 if all the quantities yj, ..., Vm are between 0 and 1 en 
to 0 if any of the quantities Y» - -+ » Ym iS below 0 or above 1. at 
computing the expression (9.5) for this case, we see that the integra 
are 0 unless the following inequalities all hold: 

ht, <1 

lol, < 1 


1-1 < 1 
MER 


: "m e 
Letting U denote min (ШШ И з 1/t,.,, 1), the inequalities аг 
equivalent to t, < U. Therefore (9.5) becomes 


U 
inl di 
Í т п m.L2] 
— e De b 
U 
m--1U 
um 
0 


Thus the decision chosen by the best invariant decision rule is 


m+2 1 
Xi -— 
"m J^ IU 
where U' — min Golfo, e E Х„/х,_1, 1). But we can write 
1 
bes E x X а L 1 1 ae 
U' = min эз, за. ы] = x, min (4,4... “Жы gig 
Xr Xs Хт-1 Xm Хі Xo Xm-l 
== Xm 
= ——__*m 
тах (ху, Xa, ... зге Ж) 


А scion 
so that 1/U’ = (1/x„) max (Хьхз,..., x, Lx). Therefore the decisio 
chosen by the best invariant decision rule is 


m+2 
m +1 


‚9.9. Interval Estimation. In the preceding sections, we have in 
discussing the Problem of point estimation. Another type of estima 
is “interval estimation,” in which the possible decisions are intervals 
al we choose to contain the true value of 9, һе 
we should also like the interval to be short in length. Let D, denote : 


n 
terval we choose and D, denote the upper € 


TX UG. + 5 25.38) 
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point of the interval. Then a reasonable type of loss function would 
seem to be as follows: 
o W(0;D,D,)— c(0(D, — р) if D, < 0 < Dy 
=A+c(O(D,—D) 10 < D, orb > D; 
where А = a given positive constant 
c(0) = a given function of 0 which is never negative 


With the type of loss function introduced in the preceding paragraph, 
Suppose we want to construct a Bayes decision rule relative to an a priori 
distribution B(0) with pdf b(0). Then 


кер, Daix) = Û [А + «00: — РОНС, + +- > ns O) dO 


J0—Dy 
Da 

де | e(O Dy — D,)b(0)f Gn «+» хн: 0) 48 
Di 

T [A + ¢(0)(D2 — DjJb(0) f Gs» - «X5 0) 40 

J0> Da 

The Bayes decision rule chooses the values of D,, Р» (with D, < Dz) 

Which minimize K(D,,D,;x). The equation (8/0D) KLDy,Dy;x) = 0 

gives 


АЫ”) (хі. Хы D) = | ADAMI ..., 0(0 (9.8) 


and the equation (2/2 D;) K(D;,Ds;x) = 0 gives 


Ab(Dy)f (N... хь} Dy) = [ADOC a) A 0939) 


ensure a minimum, in any specific example the second derivatives of 
(Dı, D,;x) should be examined. 
As an example, suppose f (Xi + + -> n? 


( tx) exp |- A Xo. — o] 


o 27 207 i=1 


0) is 


Where ¢ = a known quantity 
c(0) = C, a positive constant 


Defining 7 as (1/m)(X, + + Xn), itis 1 
he decision problem. The pdf for 7, g(z;0), say, 15 


/m|aV 2r _- (20) — 9] 
^/mlaN 22) exp [ (1/20) 
decision гше s, relative to the 


known that Z is sufficient for 


Suppose we want to construct a Bayes 
Priori distribution with 


pdf b,(0) = (1/0 V27) exp [—(6°/20")] 
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Then equations (9.8)and (9.9) become 


1 ( Pr) N m | mo | 
A "m exp err = exp DE (z Di) a 


0/27 | 

2 0 m 1 2 

=e : exp ( =)" ар Lc opla | 
=% u 2m м Bye су 27 2g 

D2) Jm 

A 1 exp ( 2) vm exp | M s D] 
v,/ 2r 2v*! o 2a 28° 5 

_ cf v d та | @ | М en | M ea o] do 

- 14/27 212) 0 27 20° 


The integral appearing in these equations has been evaluated in Sec. 9.5: 
its value is 


Күт - mz) " ji кы 
a چ‎ e —-——— 
ov [2m 208) ^P 2 g! | 
Where R = (I[r? + mjo), Simplifying the equations and faking | 
logs, we find that both D, and D, satisfy the following quadratic equatio 

in D: 3 


D mz CR 9.10) 
— 5D -lesc (2: | 
2R? 2 og A 


If CR <A, (9.10) has two distinct real roots, and D, is the lower of Toe 
two roots, Dy is the higher of the two roots. IfCR > A, (9.10) does nO 


578 ) » 
‚е two distinct real roots, and we haye a degenerate situation wher 
1 = D, 


9.10. Estimation by the Method of Maximum Likelihood. Suppor, 
that the possible Joint distributions of X,..., Xn are given by T: 
variation of u parameters, say,0,... , Ó,, and the problem is to construc 
Let D, denote the estimate of m 
» known as the method of "maxim" 


likelihood,” js Currently in very wide use: Choose the values of Dr 


Ds... , D, so that 
Poles Kai Dy Dy ..., us 


MAX o fii. Emi Oreos 0 
01,02....,0, 
s € T le. 
к maximum-likelihood decision ru 
Exa E ч 
бшш L А, Xs, ..., X, are all independent, each with a norm 
10n with known Standard deviation c and unknown mean 


The maximum-likel; n hat 
maximizes likelihood decision rule chooses the value of Ó t 


(ху, te Жыз 0) = (+) 
т. 


We give several examples of the | 
а 
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This is the same value that maximizes 


WBS (Xp os Xi 0) = —mloge 2s — 5 < (x, — 07‏ ی 
i=1‏ 20° 
авло, „ае E)‏ 9 
g i-1‏ 20 
and solving the equation‏ 
px‏ 
= )0 = 2 = 
oO i=1‏ 
We get‏ 
m DA‏ 
YXx,—m0—0 or —— Ух;‏ 
i=1 т ѓ=1‏ 


Thus the maximum-likelihood decision гше is to choose the decision 
(1/т)(ху + +++ + x,,) when the observed values are Xy, . . +» 5 Kins 

, Example 2. Жаз» ALE all independent, each with a normal 
distribution with known mean и and unknown standard deviation 0. 


Then 
| | 1 ү" я. | 
and 
д m Убх, = Ш)? " 
эр 08/0 x0 0) و اڪ‎ 


Solving the equation 
_т у کے‎ Lg 
0 9 

we get 0 = VUm Xx; — y. Thus, the maximum-likelihood decision 
Tule is to choose the decision V(1/m) X; — Ви when the акене 
values аге ы». ras 

Example 3. X,...,X, are all 

istribution with unknown mean 0, a 
Both are to be estimated. 


1 \* DX. 4 | 
IE aa, 10:05) = (<=) exp [ 202 E 1) 


independent, each with a normal 
nd unknown standard deviation 05. 


The equations 
д1ов/(ху,...,хь;б, Oo) _ û md д log f(x.» Хн: Ôn 0) _ 0 
80, д0, 


give 

l (8-0 
0,2 і=1 

om Xe 6)" _9 


0, 0? 
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xc imizing value of 
The first equation gives (1/m)(x, + +- + Xm) as thë m si ud -— i 
0,. Denoting (1/m)(x, + +++ + Xm) by X for convenience, 


m 


1 
А -— f 0.. 
equation gives | / (1/m) Y (x, — X)? as the maximizing value of 0» 
1 


: iform 

Example 4. X,,..., X,, are all independent, each with. "а if 

distribution over 0 —1$, 0-14, Thus "EE S js 

USE ажаб 0-4 € X, « 0 -1,..., oi e 

Xm < 0 + %, and f(x, ... , x,; 0) = 0 for other values o = ^^ Then, 

Let L, denote min (X «os 0) ond Ls denote max (33555 s 3 "(x f ме: 
for any value of 0 between La — Mand L, + 1, the value of f (i, 


reat, lue 
Е А P ; zing va 
0) is equal to 1. Thus, in this case, there is no unique maximizing 
of 0. 


3 iform 
Example 5. X,..., Xm are all independent, each with a E 
distribution over 0, 0, where 0 is a positive quantity pq cual 
unknown, Then Јо, ..,x,:0) = 1/0* for 0 < x, « t ET 
0,...,and0 = x, < (andis equal to zero for rq mI for 
Let L denote max Capes Ж). Then Jis saa U) ey ximizing 
0 > L, and Jd tgo sss 0) =0 for 0 <L. Clearly, the "A to set 
value of 0 is L, and so the maximum-likelihood decision rule s 
.the decision equal to the largest of the observed values Nn le: is very 
We have stated that the maximum-likelihood decision ru 
widely used, Since this is thec 
likelihood decision rule is 


expected loss function j(9:5) і 


a i own 
decision rule. However, this is not always true. Actually, ате 
that if X... Y, are al] independent with the same e m form 
the case in our examples above) and if the loss function is RR 
c(0)D — 0)°, then if certain restrictions are satisfied by the distr! d closet 

i i ў e this 


ee : rov 
ton rule as әт increases. We shall not p hes 
C 


к roa 
X CAT > but we point out that what happens as ~~ 
infinity is of limited in practical problems, where тт is fixed. the 


. . H 00 
give some information about how £0° um 
maximum-likelihood decisi і 


1 axim 
“sion rule is. [n Example 1, the ma 


Ӯ n 
НР E іпітах 
decision rule has been shown to be admissible and min 


т 


т w fr 
In Example 2, if we denote 2(x; — п)? by Z, we kno ii 
А Sec. 9.8 that 0 is 


om 


A : ‘ant decis! 
А : 2.8, but is not the same as the best invariant rule in 

rule found in Sec. 9.8. Thus th i -likelihood decision in 
S the maximum -likeli s 


= tm em 
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Example 5, for the same reasons. Thus the maximum-likelihood 
decision rule is sometimes a poor decision rule. 


9.11. Testing a Hypothesis by the Likelihood Ratio Method. The fol- 
lowing decision rule s, known as the “Jikelihood-ratio rule, "is currently 
in very wide use for the problem of testing a hypothesis: Choose D = 1 if 


max f (Xp <<< Xm 6) 
gî >К 
max f (Xi - - -> Ха} 9) 

0 


Where & is a constant chosen so that 
max r(0;s) = « 
inl 
The quantity 
max f(x... Xm3 9) 
dint  — 
max f (Xp... Xm? 0) 
0 


is known as the “likelihood ratio." р : 
€ give some examples of the use of the likelihood-ratio e 

Example l. X, Xo, Хз, Ха are independent, each with a jm 
distribution with standard deviation equal to 1 and unknown mean ©, p 
Broup I consists of all distributions given by values of 0 less than or еда 
to 2. (Note that it is not necessary to specify group im pnl dt: 
likelihood-ratio decision rule.) а = 0.1. Denoting ( 4)(x1 + Xa + Xs 
Ха) by z, we know from Example 1 of Sec. 9,10 that 


1 4 E 4 x [| 


It is easily seen that 


а 1 < 4-2] ifz«2 
Оаа 


0 сы ы. Xis = | ——] е У 2 
in I f(x ZZ 0) | 4 xP 2 25 
< 2 апа is equal to 


ifz >2 


Therefore the likelihood ratio is equal to 1 ifz 
2 if z 2 
exp [15 ge 1.2% x — of] = exp [2-77 SER 
s AN qu RN i 2, it is clear that 
Since exp [—2(z — 2)*] decreases as 2 increases р Е < с, where с 
Ше likelihood-ratio test in this сазе chooses D = whens E 
15 à value chosen so that 
max r(0;5) = 0.1 
din 
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In the numerical example of Sec. 9.3, c was shown to be ee ue 
Thus we see that in the present example, the likelihood-ratio ec 
rule is the same as the minimax decision rule found in See. 9.3. 4 - 
Example 2. X,,..., Xare independent, each with the same „у A 
distribution with standard deviation equal to 2 and unknown wie 
Group I consists of the single distribution given by 0 = 3. = = 0. 


1 | - _ y] 
max f(x, . . . хь; 0) = f(x... x9; 3) = UN exp | — 2 Xi 


бїп1 2/27 


С. 
Denoting (J6)( +--+ + X16) by z, we know from Example 1 of Se 
9.10 that 


/ | we | 38 2 
max f(y. хь; 0) = ( =) ap|- 136-2 
é 2/27 861 


Therefore the likelihood ratio is equal to exp [—2(z — 3)°], and a 
decreases as |= — 3| increases, it is clear that the likelihood-ratio TU so 
this case chooses D — 1 when |z — 3| < c, where c is a value pu er 
that r(3;s) = 0.05. Table | in the Appendix shows that the ies il 
value of c is equal to 0.98. Whether this rule is minimax depen ical 
how group II is chosen. The rule is not minimax for the numer 
example in Sec. 9.4, each 
Example 3, Misa wis СА ae X,, are all independent, ich is 
witha normal distribution with the same standard deviation 0, 1. er 
unknown. The means of X5... .., X, are known to be zero; the n 
of ЖХ, is an unknown quantity 0; for i = l2, Thus 


IG «+ xwi Op <<<, B б) = "لما‎ 


bua Р 1 
1 r 21 Y NJ 
x exp | 20 NS "Wr, 
207, = To 
Group 1 Consists of the distributions with fy eer % we 
шл X10... Ж 0,.0 with respect to Ope- 7 
V hs ay 0;—3. Then we have to maximize 
m ) 
п Xp 
0, c 2r 205 a4 2, ‘ he 
with respect to быа» d 


As in Example 2 of Sec. 9.10, we find tha 
VOUS: 


Rus . І m 
maximizing value of Gray is; = > х2, and therefore 
i=r41 


m 


2 " 
xe (ze x 5 


m i-r*1 


max : 
„ыр ы ©» ES Nai 0, 


CONVENTIONAL STATISTICAL THEORY 163 
To find max f(x, oos Xp Ors «+ + 5 0, 0,4), We note that 0 = +++ = 0, = 
0, and so we must maximize | 
ба ү j ш 
s xj exp |- 204 P 
with respect to 0,,,. The maximizing value of 0,41 is xA and 
i-i 


therefore 


m 


^m m - б 
WAX (Bi, ннн Oa) = (2.52) ° 


Oi 
m i=l 


The likel; 
he likelihood ratio is then equal to 


Defining Tas 


on of the likelihood 
le is equivalent to 
n so that 


it can , : 
ТАП Масавы that Tis а strictly decreasing functi 
Shera herefore the likelihood-ratio decision ru 
› : 

ng D = | when T - c, where c Is a constant chose 

Si max r(0;s) = х 

In " F vinl 
ce T can be written as ' 
m —r Qul a + + Xb 
and si F (Cal r+ ЧЕЧЕ (X mlb? 

since the standard deviation of X;/0, іѕ equal to 1, we know from 
an F distribution with r 


е 
а, Ass when 0, 20,0, T has 
EST freedom in the numerator and т — r degrees of freedom in the 
CA иок This fact enables us to find the value of c from Table 4 in 
that Pun. When 0, ... , 0, are not allzero, we know from Sec. 4.12 
пите Nas а noncentral F distribution with r degrees of freedom in the 
rator, m — r degrees of freedom in the denominator, and non- 


Central: 
ee parameter (02 + = 00a. _. | 
he problem discussed in Example 3 is the simplest representative of 
Pro Ji RE class of problems known as "analysis-of variance 
etailed j Most texts on conventional statistical methods contain 
Se. descriptions of such problems. 
at is the likelihood-ratio d 
Problems tes as іп the case of maximum 
er ae e likelihood-ratio decision ru 
lems it is inadmissible. 


ecision rule? The situation is 
-likelihood estimates. In some 


muc 
le is а good decision rule; in 
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Section 1.3 
І, 
mr С, D, E are events, the event (C or D or E) is defined as the event which 
(Cana E any trial where at least one of the events C. D, E occurs. The event 
CDa i E)is defined as the event which occurs on any trial where all the events 
fa » and E occur. Show that for any events C DE, 
(Cor D or E) = P(C) + P(D) + P(E) – P(C and D) 
E — P(C and E) — РОР and E) + P(C and D and E) 
P(D) Prove that P(D and E) < P(D). Under what circumstances is P(D and E) 
3. Р 
. > If the top card is drawn from a well-shuffled deck, wh 


it $ an ace or a spade? 
. Ifa well-balanced die is rolled, what is the probability that the face that will 


come is е 
up is even or greater than 3? 


at is the probability that 


Section 1,4 


ane = experiment consists in thoroughly shuffling a 
given th: р card, what is the conditional probability tha 
"корту сага? 
face mL! well-balanced die is rolled, what is the conditional probability that the 
3. If thee up is even, given that it is greater than 3? 
Probability top card of а well-shuffled deck is turned up, WI 
ility that it is an ace, given that it 15 а spade? 


deck of cards and turning 
t the top card is a spade, 


hat is the conditional 


Section 1.5 
hly shuffling а deck of cards and 


fina an experiment consists in thoroughly perche el: 
(ау БЫР the top card, which of the following pairs of events are independent: 
(b) TE card spade, top card black. 
(c) mee card spade, top card red. 
(d) rus card ace, top card black. 
2. If ор card ace, top card above a 10. ability both will com 
Up he two well-balanced coins are tossed, what 15 the probability both wi oe 
E ad? What is the probability that one will come up head and the other tail? 
песе If C is independent of E and D is independent of E, is the event (C and D) 
4 Ssarily independent of E? 15 the event (C or D) necessarily independent of E? 
| df Cis independent of D, is the event (not C) necessarily independent of D? 
165 
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Section 1.6 


ill all 
i ability that they will a 
1. If n well-balanced coins are tossed, what is the probability that y 
come up head? 


ill all 
; ability that they wi 
2. If п well-balanced coins are tossed, what is the probability th y 


face? and -++ and An) = 
OC с Aj, A2...., An, show that P(A, and A, = dk 
P(A)P(As | A)PUG | А, and ду" P(A, | Ay and Ay and + н E. d ege 
4 "Find am example where there are events D, E, P WM, E endef 
D F independent, E, F independent, but D, E, F not mutually inet І -or A, )and B] 
5 If A,,...,A,, B are any events, show that the event [( Ay or 
is exactly the same event as [C4; and B) or -- or (A, and 5). i esent; gnaw th 
6. If Ay,..., A, are mutually exclusive by pairs and Bis any Sa j 
the events (A, and B),..., (А, and B) are mutually piste ү Ёл Ag of" 
To dE НИН A, are mutually exclusive by pairs, show that P(A, 
or 4,| B) = P(A; | B) + РОА, |В)... a P(A, | В). — 5 
8. Show that P(A or B|C) = Р(А | C) 3 P(B| C) — P(A г 
Section 1.8 


at 


f 
, jays 0 
„= erent W 
Г. Write out a formal proof of the fact that the number of diffe 


i ; j n ; 
choosing k places out of m places is m'/[k ! (mn 2 №1. | acesinto rdi 
2. Prove that the number of different ways of dividing m places д 


£ wp 
Broups, with exactly k; places in group i, where k, + ky 
mk! kyl s+ kay. 


fferent 
m, 18 


[ the 
any of th 
" Р e probability i 
3. Suppose an experiment has r possible outcomes, with E: E experimen 
outcome of type i equal to P, Where ру... ре la I 
performed m Separate times, prove that 
P (outcome of type 1 occurs exactly К, times, and 
Outcome of type 2 occurs exactly ky times, and - - < 
and outcome of type r occurs exactly К, times) pit 
m! ph pate ens 
Ilka! yl ай 
р kp 
where ky, ko... ., К, are any nonnegative integers with ky Л» 


(Use the formula of Exercise 2 and the reasoning of Example 1.) jen 
4. Find the probability that the top five cards of a well-shuffled dec 
Same unspecified suit. 
5. Find the Probability that the to 
cards of the same Unspecified deno 
6. Find the Probability that the 
cards of five differer n 


Ty df Example 3 is generalized in 
by black сг i 


he 
ill be of 


r 

: бай 

ain f 
єк conta 

р five cards of a well-shuffied deck 

mination. 


ed de 
top five cards of a well-shuffled 


ain 
ck cont" 


ж nae - 4 the 
is; probability that the card drawn from box II is red. find 

Se 7 is modified by hav Шу 
Obability th; 


s xs red. „pili 
drawn from box II is red obab 


м e pr 
) froma well-shuffled deck, what is the [ 
П is the second heart drawn? ta 
À led twice, what is the probability that the tot 
Spots will be T? i 
П. Ifa 
I-spot will 


or O 
1 number 
m 
hree timo 
üt is the probability pude apre? 
appear, and twice a 5-spo 


fair die is rolle 


d eight limes, wh 
appear, three ti 


mes а 2-spot will 
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12. Ifa fair diei 
с ie is roll iat 
5 rolls will be i until a l-spot appears, find the probability that fewer than 


Section 2.2 


l. If two fair di 
Е ага А 
variable defined it ate thrown, what is the probability distribution for the chance 
2. lf a cain ris he ША of spots that will be face up? 
three times, what Lage ability 14 of coming up head on each toss, and it is tossed 
the number of tt € probability distribution of the chance variable defined as 
3. It three hn on which a head will come up? 
variable defined без REC rolled, what is the probability distribution for the chance 
А 4. Іа hana an E Є largest of the three numbers that will be face up? 
ability АБО сн cards is dealt from a well-shuffled deck, what is the prob- 
and? or the chance variable defined as the number of spades in the 


Section 2.3 


L. df rx а z 
Еау ) Pe AD are any functions of x, and A and Bare any constants, show that 
i ie eg ne LEAD) + B EGON 
. What кел, prove that E(X — cj E(X?) — 2cE(X) + e 
gs If Y c sg ofc makes E((X = с?) а minimum? 
Or) and Ei? the chance variable defined in Exercise ! of Sec. 2.2, compute 
. Su 
Cents Me AA à magazine dealer buys a certain magaz 
the publisher pens to customers at 50 cents per copy. 
Magazine, and or 10 cents per copy. There are 10 potenti 
Caler stocks 5 has probability 14 of actually requesting the magazine. If the 
at number B Lon of the magazine, what is the expected value of his net profit? 
zi & I3 denot opies should he stock to maximize his expected net profit? 
Ех, es the chance variable defined in Exercise 3 of Sec. 22 find ЕА. 


ine from the publisher at 20 
and returns unsold copies to 
al customers for the 


ETT 
E(cx > Erin the chance variable defined in Exercise 4 of Sec. 2.2, find E(X). 
А ST fe 
and 3 i i 2 
к^ chance variable X with a probability distribution such that E{X?} = 


Dray 
vt " ; 1 
he graph of the cumulative distribution function for the chance variable 


OP Буер 
“Xercis 

: кн id Sec. 22. Find P(2 = X < 6) from the graph. 

€ graph of the cumulative distribution function for the 


"Xercis 
se2 of S 
е ec. 2.2. 
function for the chance variable 


chance variable 


3<X <6). 


Exere 
ise Р 
3 of Sec. 2.2. From the graph, find P | m 
function of Exercise 4 of Sec. 2.2. 


Ted th А 5 

Sé € graph of the cumulative distribution 
lon 2 6 

де number of times а head 


note tl 
P- Construct the 


l.S 

l. Su 
Wi Ose г n EU : 
m Qm a fair coin is tossed 5 times. Let X denote 
E. аг е, „апа Y denote the number of times a tail will come u 
Bars anode DODENUS distribution for X. Y- | 
id Ming up ges construct a coin with probability 18 g coming үр 
SEE ап, and probability 1 ding on edge. The coin is tO 
реа аи Е А d let Y denote the num 


enot ө _ 
e the number of times a head will come ир, ап 


head, 14 of 
ssed 5 times. 
ber 
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ST ability distribution 
of times a tail will come up. Construct the table of the joint probability distr 
5 that will 
a к ose four fair dice are rolled. Let X denote the smallest perp 
come j id let Y denote the largest. Construct the table of the joint р 
гат) = for Y, У. : | t Y denote 
тоо а hand of four cards is dealt from a well-shuffled cae mekan 
the Ж of spades in the hand and Y denote the number of -- 
Construct the table of the joint probability distribution for Y, Y. 


Section 2.7 


; А ес. 2.6, compute 
l. For the chance variables X, Y defined in Exercise А. of РА 
(X), EY), ЕХ + 3Y}, Е(ХҮ}, EX}, E(X + Y9}, ELV? + ҮЗ). ревет 
вч ; ; ; і he coin will st 
2. In Exercise 2 of Sec. 2.6, define Z as the number of times the 
edge. Compute E{X YZ}, ELY + y 4 Zh; E(x? YZ}. — 
3. For the chance variables X, Y defined in Exercise 3 of Sec. 2. 
E(X + Y), ELXY}, E(MX Y]. | 3%, compile 
[i For the ‘chance variables X, Y defined in Exercise 4 of Sec. 2.6, 
E(XY], EBXY) Efy 21 } 


Section 2,8 


- y) for 
(x.y), 
ance variables defined in Exercise 3 of Sec. 2.6, compus pana 
all combinations x, J appearing in the headings of the tabled probability dis 


Section 2.9 


l. For the chance variables X, Y defin 
marginal distributions for X and Y in t 
X and Y. 

2. For the chance variables Н 
joint marginal distribution for X, Y in table form. ind the margin 

3. For the chance variables defined in Exercise 3 of Sec. 2.6, find 
distribution for X and the marginal distribution for Y, in table form. ‘dint prob: 

4. Prove that Elr(X)) is the same Whether it is computed from the j (x) 
ability distributi 


rix 
ж» A ү where 

Оп for X, Y or from the marginal distribution for X, 

is any function, 

Section 2.10 


nal êd 
able form, and plot the marginal 


he 
find t 
X, Y, Z defined in Exercise 2 of Sec. 2.7, 


nd th j 
1 For the chance variables Y, y defined in Exercise 1 of Sec. 2.6 1 А 
Conditional distribution for X given that y — d the jo" 

2. For the chance variables Y, y. Z defined in Exercise 2 of Sec. 2.7 an the co” | 
conditional distribution (in table form) for Wd. given that Z = 1 En t 
ditional distribution for Y, given that y + Z 2. ose pori 

3. Suppose that y. Yare any jointly distributed chance variables. a X а 
each of the symbols Si,..., Si, denotes a certain set of possible values Ne. 

A; denotes the event (Y takes a value in Sj). Show that if the events £n j, then 
are mutually exclusive by pairs and P(A,) + PAJ +--- PA = 
E(x} = PADELX| А 4 KADE | Aa} --- + PUAQEUX | Ap}: d the co? 

4. For the Chance variables Y, y defined in Exercise 3 of Sec. 2.6, fin 

ditional distribution for у, given that y 


72. Find K(Y| x = 2). 
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Section 2.11 


LI y А 

2. nds are independent, show that ЕА — E(X]X Y — Ef =, 
E(xyi — ri probability distribution for EE oss Y |; 

3. Prove de j but X, Y are not independent. ud a 
and Y = y) = py 5 tiat if F(x,y) = FyGQ)FsQ) for all values of x, y, then Р(Х = 

4. IF dis sep X = x)P(Y = y) for all values of x, у. Tm m 
ЕА. Wes mek Рр "е is independent of the set Yj,..., Ys prove that 

AR TOS а کک‎ X, A f y. 

ions (у), n vd 3 Ef (Ху... XDEUO'D +> Y,)}, for any func- 
Section 3.1 ` d 


l. If a fair di 
air die i ; 
rolls that will з is rolled until a 6-spot appears, and X is defined as the number of 
2. 1f a fair die required, what is the probability distribution of X? 
defined Bede ik A until a 6-spot appears on two successive rolls, and X is 
ution of Хэ nber of rolls that will be necessary, what is the probability distri- 


Section 34 


l. Desay 
. Describi . à 7 
e the scale S which would give as the cdf for X the function F(x) 


le 
fined as follows: 


F(x) =0 
F(x) = (Qy@ + 1)? 1 
F(x) = 1 


5 
^. Descri , 
ribe the scale S which would give as the cdf for X the function F(x) 


defined as follows: 
Fa) = I forx «0 
Section 3,4 F(x) =1 for x > 0 
1 of Sec. 3.3. 


the cdf of Exercise 
the cdf of Exercise 2 of Sec. 3.3. 
(x) defined as follows: 


1. Fi 

> i the pdf corresponding to 

3. Find пе pdf corresponding to 
the cdf corresponding to the pdf f 


Го) =0 forx «0 
Го) = × for 0 <x < 1 
О = 2 E for 1 2 
4. Whi Го) =0 for x > 2 
ich of the following functions f(x) are probability density functions: 


( a 
a) f(x) = 0 for x < 0, f(x) = е7" for x > 0 


) f(x) = sje"! for — 0 < x < ® 


© Го) = на ли for = о <x < %0 
(а Ll 
) f(x) = 2e-?: for =o <x < © 


(e) 1 1 
Го) m + ;; for 50 2 076 50 


а dial 
at 


Section 3.5 
er is mounted on a roun 


pinn! 
ch the arrowhead will point. 


li 
ьс POS a well-balanced arrowhead spinne 
rom 0 to 1. Let X be the number to whi 
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; iri ad as 1.) 
the arrowhead points to the place which could be either 0 or I, it he’ Eo» di 
If only & decimal places can be read on the dial, compute E(x}, wx 
these approach as k increases? i 
2 For tfe pdf in Exercise 3 of Sec. 3.4, compute Е{хт). Вей 
3. For each f(x) in Exercise 4 of Sec. 3.4 which is a pdf, compute Е{Х) 


Section 3.6 


А : nounted as 

1. Suppose we have two well-balanced arrowheaded spinners, each ? laces, the 
in Exercise 1 of Sec. 3.5. One of the dials can be read to two а : inners is 
other can be read to an infinite number of decimal places. One о "emi as the 
chosen at random and set in motion, and the chance variable 3 а mpute БҮ 
number to which the spinner will point. What is the cdf for Y? Comp 
EX}: 
Section 3.7 


m 
Р " J sted fro 
1. Suppose that the number of copies of a magazine that will be reque 

a newsstand is a chance variable X, where 


10! р 10 
‚шй Зу уо for X = Û, Дә 
еч x!(10 — xi 4 a) 


“profit” 
If the dealer makes a net Profit of 50 cents on each magazine sold rie cie of his 
of —20 cents on each magazine unsold, what is the probability distri naximize t 
net profit if he stocks 6 magazines? What number should he stock to ma 
expected value of his net profit? - sed: 
2. Suppose that a tax 9n gross income described as follows is impo: 
Gross income 


Р, Кд o cen 
А ercentage of income tak 
bove 0, but less than 5,000 


10 
5,000 or above, less than 10,000 15 
10,000 or above, less than 25,000 20 
25,000 or above 40 


. ; aii (3) = 
If gross income is a chance variable with pdf f(x) = 0 for x < 0. f 
0.0001e-0.001 fop x - 0, find the Pdf for income after tax is taken out. 
3. If Xhas cdf 1 — 


А е 
Y € *for x > 0, find the €df for the chance variable Y d 
1—е-Х, 

4. If X has cdf y^ ford < x < 


dt 
< 1, where is a given positive number, fin 
for the chance variable Y defined as Ж“, . 4, find the 


2. If X has pdf f(x) = 1 foro c4 L 1, f(x) = 0 for x < 0 or x ` 
pdf for the chance variable Y defined as —log x. 
Section 3.8 


fined 25 


ne cdf 


2 1, 
zy = 
1. Suppose X, Y have the joint pdf f(x,y) = 1 foro < x < 1 and 0 s" Find 
f(xy) =0 for x < Оогх < 1 юру 29 Or y = 1, Find the cdf FG y? 
БО жй «ы М ү. s 


16), PAZ < X+ Y<] EÍXM ЕХ + Y) EX т 
Е{ХҮ} CEP T MASS © DEUX Et "E 
2. Suppose X, Y have the Joint pdf f(xy) — l/m for x? + у? < nis "2 
IUE x ES Find the cdf Р(х,у). “Find э < X <0,0< 
P(X + Y2 14). E(X}, BL Р Y), ЕГА 2 


2; d 
+ YS EY 6, 3:50 S 
(х,у) = 2 for x > 0, J Р(х. 
or y <0 coe y > 1. Find the cdf 
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4. 5 2 ioi 
Pi d кро X, Y have the joint pdf f(x,y) = 3(х + у) for x > 0, y > 0, and 
Find P f б у) T Oforx < Oor y< Oor x + y > 1. Find the cdf Р(х). 
02K < M, 14 < Y < J), E(X + Yh ЕК + УР) Е? +). 


Section 3.9 


ds: TR x ү s Е 3 " 

pat for Ww E on ba r^ par amiben in Exercise 1 of Sec. 3.8, find the joint 
2. ТЕД Y ha P сш A run р 

for jj^ ^ acr v pdf described in Exercise 2 of Sec. 3.8, find the joint pdf 
3. It 3 >К а ae “ез Р 

for а z. = Jour par described in Exercise 3 of Sec. 3.8, find the joint pdf 
4. If Eis MS ME MM | 

for be шеш шнш in Exercise 4 of Sec. 3.8, find the joint pdf 


Section 3.10 

п Exercise 1 of Sec. 3.8, find the marginal 
г Y. Are X, Y independent 
X given that Y — 14, and 


1. Cy 
cdf n Y have the joint pdf described i 
chance Ls ^ A and the marginal cdf and pdf fo 

ariables? Fi t| 
Elx] y- 14 s? Find the conditional pdf for 
e 2 of Sec. 3.8, find the marginal 


2 V ha. si Е à 
cdf es X, Y have the joint pdf described in Exercis 
pdf for X and the marginal cdf and pdf for Y. Are X, Y independent chance 


Variables 
pdt for Y Find the conditional pdf for X given that y — 0. Find the conditional 
3; fa prr that Y <0. Find E[X?| Y = 0). E(X | < 0}. | 
cdf апі. T have the joint pdf described in Exercise 3 of Sec. 3.8, find the marginal 
chance a 4 for X and the marginal cdf and pdf for Y. Are X, Y independent 
4. If apr aca Find the conditional cdf for X given that Y = 16. | 
раг ana ‚ Y have the joint pdf described in Exercise 4 of Sec. 3.8, find the marginal 
Chance cdf for X and the marginal pdf and cdf for Y. Are X. Y independent 
E(Y| y BIN Find the conditional pdf for Y given that х= 105. Find 
7er 


Section 3.11 


1 
of jee a formula for P(x < X < Xe Ji 7 
2 ar quantities Fx iii). Е(х\у,у 
چ‎ 1 | ^i У, Z have joint pdf f Gc.y.2) TY for x = 0,7 
раг For b (xyz) = 0 for x <0 or y < 0 or z <0 or x 
(а) a = X + Y + Z by two different methods: : 
Variables. method described in the text of introducing convenient 


к: Fina f [resa dx dy dz, and differentiate with resp! 
à 2, find fi (ху), fa”) 


3. ЖЕШ аны ais e Жын 
For the chance variables X, Y, Z described in Exercise ^ 


f 

AE E(XY|Z = 14). 
Fees, the joint pdf for X, Y, Z is / G2 = geri for x > 0, y 
J х,у) = д for x <0 or y <0 or z < 0, find the pdf for W = d 


first А + 
finding | [ren dx dy dz. 


,4«Z2*« z,) in terms 


,0,andx-k yr? 


+z > 1, find the 


"extra" chance 


ect to w. 


E 


chance variables. where the 
t of chance variables 


jointly distributed 
dent of the se 


., Y, аге} 


5. If x à el * 
Ыы =, I 
X, is indepen 


Set of cha 
f chance variables X1...» 
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Yn, and if the chance variable W is defined as a — hong ee ee z 
ic the chance variable Z is defined as a function of У,,..., n p 
are independent. 


ibed i i . 3.8, find 
6. For each of the joint distributions described in the exercises of Sec 
the pdf for W = x + у, 


Section 4.1 


гу) 
) = г< Oor x > 1, find ЕХ}, 
h df f(x) = 1 foro <x <1, f(x) = 0 for x p > MO 
Брз) £0] p s integration and also by finding M x(r) and differentia 8 


2. If X has the probability distribution given by the following table: 


Possible values 12 8 4 5 


i Z 
Probability мики 1% 


" and 
omputation and also by finding M (1) an 
| е hich 
3. If X has pdf f(x) = AA + x2), which moments of X exist? For w 
values of t does M x(t) exist? 


Section 4.2 


t it is 
y distributions, State whether or no 


1. For each of the following probabilit 
ution. If it is, state what the parameters are, 


a binomial distrib 


(a) 0 1 2 3 
AS E PN EL 
Yar 95 EM 
(b) 0 1 2 
—— E 
Ms | Ke %6 
2. If Xhasa 


binomial distribution with 
3. Show that if x, X, are ind 


AM وا‎ d 
a binomial distribution with the 


i M rt p í pa 
with parameters n, p, find Е(ХЗ}, E(X? 
Section 4.3 


: іѕсаѕе 
1. If on the average 1 person in 10,000 Contracts a certain Tioncontagious 50.00 
in one year, what is the Probability that More than 3 People in a town of 20, 
contract the disease in à given year? 7 = 0), 

2, If X has а попі distribution with п = 100 and p= 00, ШО d 
P(X = 1), Pry = 2). Compare these үг ith the Corresponding probabi 
given by the Poisson distribution with pa 


: a | 
Parameter л, find Е{хз), ЕХ T , distri- 
bus b. X, are independent Chance Variables, each with a Poisson 
ution К 


L Y, has 
1 T X; equal to 2;, show that Z = y, +--+ +X, 

а Poisson distribution What is the Parameter for 79 son- 
In each of the fo Owing cases, describe why the chance variable could rea 
ably be Supposed to have a Poiss i 


on distribution. 


0 the 
Pographical errors опа given Page of a book, when 
© number of Symbols, 
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ES i i rief time period 
(b) X is the number of telephone calls originated during a given b P 
in a large city. А " na loth: | 
(с) Ys the number of defects in a given piece Ed ts a noncontagious disease 
i Suppose the probability that the ith person lemen < Klar all c. 
ina iven time eriod is Pi- Show that if there are л people, E TUR total ушп 
inis 4 2 A then as n increases, the bgt p жы ез и са: 
A NR » 8 Н гъ iod арргог 
of раз who vil contract the disease in the given period app 
istribution with parameter 2. 


Section 4,4 жы 
: MA — B, find E(X). 
1. If X hasa hypergeometric distribution with parameters л, А, I 


utior : ы п, К, В, show by 
2. If Y has а hypergeometric distribution with parameters 
direct Calculation that 


3i в wj B " 
n m POP wd» хп — x)! (s + л) (х А 


; а 6 items is drawn 
3. Ifa batch of 100 items contains 5 defectives and a er de ан f detectives 
1 batch, compute the probability distribution for 
е 


i i ага 1, К, B, what condi- 
5 AP LX Ras a hypergeometric distribution with erectum peer 
Pon’ On the Parameters would imply that the distribu 
Sisson? 
Section 4.5 
find Mx(t). 
lity has a uniform distribution between A ай Б addidi 
4 has а uniform distribution Веви А and B, I 
What is the distribution of Y = (X — Oy D? ———— 
| "Onstruct a joint distribution for x 1 so а. meng ins | 
ка © Marginal distributions is a uniform distri nem Me oii 
about ЧРроѕе Y has саг F(x), where F(x) is not contin i 
“Sout t € edf for Y F(X)? 
ction 4.6 


as a normal distribution 
ih Suppose the number of items that will be demanded nas ОРТ и 
that Ten 10,000 and standard deviation 1,000. Ma Sa a Site tena 
D» 143000 items will be demanded? That fewer dan e | 

B сеп 9.000 and 12,000 items will be demande РИ — 

"Ppose the demand is as described in Exercise 1, E I 
ma iı, Ch item 5014, a net 1055 of 50 cents is made cw onem И 

ANY items Should be stocked to maximize expecte be by toe рой of « 
"wd "pose that the situation in — э rahe rere, ts percentage 

Чү ы Жор ked to maximize 

E tax iş too "e profit, аз po items should be stocked 

Pecte Value of net profit after taxation? „о, find Е[Х®}, BUX, 

d aS à normal distribution with parameters u, б, 


E ing the fact that 
j Evaluate væ f exp (—(14)3*) dy approximately by using 
ti S imati -04)9 in 
Mis qual to Wy a (Vin) p ( —(14) Y) dy and approximating exp (—(14 : 
i eol coor this approxi- 
nia, last expressio by1 do 2 + у%/8 — y5/48 + y*/384. Compare this app 

n * +. J ) 
oh Value with tbe Майы in Table 1 in the Appendix. 
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ч бй 
1. If X has a Poisson distribution with parameter /, prove that the distributio 
1 Vi istribution as 2 increases. 
of(X — Al Via roaches the standard normal distribu on as : M 
2. If X, Y Es invenit each with a normal distribution with paramete 
| ively istributi + Y. 
иу, 6 and us, с, respectively, find the distribution of X at 
EN X has a binomial distribution with parameters л, p, prove that the mome 
generating function for (X — np) V np 1 = ) 
function for a standard normal distribution : 


З i Н > и iform 
5. И Жү, Xs... , X, аге independent chance variables, each with a Lal © 
е d / ar 
distribution between 0 and 1, define Y, as (Xite Xa – (15)п)/ y AT at 
find the moment generating function of Y, What does this moment generating 
function approach as л increases? 


Section 4.8 


at the 
1. If W has a chi-square distribution with л degrees of freedom, show that th 


Я P P atin 
moment generating function for (W — п) V 2n approaches the moment generating 
function for the standard normal distribution, as л increases. 


df 
2. If Whas a chi-square distribution with 2 degrees of freedom, find the с 


3. Suppose that W has a chi-square distribution with 30 degrees of € 
Using the fact that(w — 30)/ V 60 has approximately a standard normal distribution, 


approximate the value of w for which PW < w) = A, for the values of A given ! 
Table 2 in the Appendix. Com 


" » alues 
1 pare these approximate values with the exact val 
from Table 2 in the Appendix. 


Section 4,9 


| ЖЕЧҮ with 
has a noncentral chi-square distribution 

4 decreases as m increases. find 
neentral chi-square distribution with parameters л and m, 
EW}, EL wy, Ew3}, 


Р — m. 
à noncentral chi-square distribution with parameters 7 and 
W by A and the 


at as 
Standard deviation of Ww by B. Show tha 
ent generating fu 


n ent 
: | nction for (W — A)/B approaches the mom 
generating function for the Standard normal distribution, 


T 
grees of freedom, show that the pdf for 


ndard n as n increases. 
* has a 7 distribution with nd 
1 


Я 2 
Е CCS of freedom, find E/T}, EIT?) Д 
has a z distribution with 1 degree of freedom, find and plot the cdf for T 
4. If T hasa ; distribution with 3 degrees of freedom, find and plot the cdf for 
Section 4.11 


ki If T has a 7 


an F 
distribution, 


freedom, show that 7? has 
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3. If Z has an F distribution with r degrees of freedom in the numerator and s 
degrees of freedom in the denominator, find the mean and variance of Z. 

4. If Z has an F distribution with r degrees of freedom in the numerator and s 
degrees Of freedom in the denominator, which moments of Z exist? 


Section 4,12 

l. 172 hasa noncentral F distribution with noncentrality parameter m, show that 
PZ < 2) decreases as m increases. 
Section 4.13 


. l. Find the Correlation coefficient between X and Y for each of the following 
Joint pdf's: 


@) Гоу) = 2 for x > 0, у > 0, and x + у <1 
S(x,y) = 0 for x < 0, огу <O,orx + y >1 
5) f(xy) = l/r for x? +y? «1 
f(xy) = 0 for x + у? >1 
©) f(xy) = en for x > дапа у>0 
f(xy) =0 forx <Oory <0 


nis iables 
Ж Construct а joint probability distribution in table form for chance varia 
>, Such that руы. = 1. 
Ind E(X?y2\ for each of the joint pdf's in Exercise 1. —— ibed in 
Е ы Find the correlation н Ае BEE X and Y for the joint pdf describe 
Xercise 4 of Sec, 3.8. 


Section 4.14 


i e joint 
m l. For each of the joint distributions in Exercise 1 of oe find the j 
LT m ioi adf HU. mda ЗА ^ X how is the moment 
Nera OM the joint moment generating function for A, . . . , m, 
Senerating ШОН for X, ib Pim ja obtained? lllustrate with an example. 


i joint distribution described 
in E ind the joint moment generating function for the joint distribution 
Xercise 4 of Sec. 3.8. 
Section 4.16 


1. IF y 


ivari istribution with parameters 4, Ho, бү, O2, р, 
find the 1: Xy have a bivariate normal distribution with para 
с 


Onditional pdf for Y, given that Xs po AME "P 

nd ‘D Xa have E bivariate nd distribution with pt ma бы where 

1 Де joint pdf for Y, = A,X, + B,Xs + Cay Me Mara t Sedge Ыз 

3, & Bi Ba, Су, C, are constants with А,В» + Аар, f chance variables, each 

ir PUPPOSE (X1, Yi), (Xa, Y)... (Xn, Yp) are л pairs of c 

Pair bej ) b 1), (Xa, Yo), ..., (А, 

ei 

by ^B the Same for all values of i. Denote the mean te 
‚ the standard deviation of X; by C, and the Sin i Mn + Y, — пВуУпр. 

E; Пе HW, a8 OXY ы... 4 X, = nAY VC, Z, as (Y, + най," the joint АЗГЕ 

Sens © joint moment generating function for (W,,Z,) in tert s analogous to those 

in crating function for (А, Y,). Show that under сонор ү a touches 

joj sed in Sec. 4.7, the joint moment generating function for (y pa cet mee 
Nt moment generating function for a bivariate normal distribu 


Section 4.17 


fi 
A 


d deviation of Y; by D. 


Чир Suppose that the probability that a telephone Ses ala тр ааш Me 
“interval (1, у Ar) given that it has not ended before time г, is 0. qgt,An, 
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i i hat 
formly іп . М 
/ At approaches zero, unif 
1,41)]/At approaches zero as ї : 
т ther the conversation lasts for more than 2 min 


is 
i other one i 
2. Suppose that as soon as one telephone conversation ends, an 

started, and that the total len 


distribution for Y. 


; i uipment ` 
3. If X is a chance variable representing the length of life ofa ise ы t 
and X has pdf f(x) and cdf F(), show that the "death rate" at ti 
Paes i ditional 
7ш Р exponential distribution with parameter 0, find H eon 
distribution of X — A, given that X > A, where A is a Positive con 
Section 5.1 


ase of sets in 
1. Write out a proof of the Supporting hyperplane theorem for the cas 
three dimensions, 


Section 5.6 


$500 per month. When the machine fails, it h 


ision İS 
to buy the machine ог nottobuy. Beforea Деш ОЛ, 
made, the chance variables X, X, are to be Observed, where X1, Xa are inde КОП 
independent of Y, and each has the same distribution as У. Suppose the 
rule s is to choose that decision Which would 
equal to (eX, + Xa). Find the 


uppose that a company is s 
certain time. On i 


fit o 
the company makes a Da an 
:000 for each item undelivered. P 
unknown probability ila ; ат 


Production р 


efore a deci 
» Where X is the number of ite. 


appropriate if 9 
r(0;s). 


73 " * x E 
4. Modify Exercise 3 by giving surplus finished items a value of $100 ion 
- In certain Problems, the joint distribution 
that is chos, 


1510 
for X, Y is affected by the deci 
en, SO that the jo 


Does 
int distribution Must be written as Г0;у;0,р). 
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this cause any difficulty in the development? How must the formula for r(0;s) be 
modified? Are any of the exercises above of this type? | 
6. Suppose that a rancher raising 1,000 head of cattle has to decide whether or not 
to include a Special vitamin supplement in their diet. Including the supplement 
Would cost $5,000. When the animals are sold, each is classified as either Grade Aor 
rade B, and the price paid for each Grade A animal is $200, the price paid for each 
Grade B animal is $180. If the vitamin supplement is not included in the diet 
of an animal, the probability that the animal will become Grade A is 15. If the 
Vitamin Supplement is included in the diet of an animal, the probability that the 
animal will become Grade A is some unknown value 0 between 14 and 1. Before 
Making his decision, the rancher will observe (at no cost) how many of 5 test 
animals fed the supplement become Grade A. Letting X denote the number of 
the test animals that become Grade A and Y denote the number of the 1,000 
animals that will become Grade A and setting D = | if the decision made is not to 
Include the Supplement, setting D = 2 if the decision made is to include the supple- 
Ment, what is the function W(y;D:x)? Find r(0;s) for the decision rule s given by 
SUX) = 1 for x < 4, s(1;5) = 0. А > 
;,^ person has to ph whether or not to insure a piece of jewelry worth 
$2,000 against theft for a period of 1 year. To insure the article costs $20. The 
Probability that the article will be stolen during the year is an unknown vale 0. 
Cfore deciding whether or not to insure the article, the person will observe (at no 


are Stolen, What is the function W(y;D:x)? Find r(0;s) for the decision rule s 
du by 5(1;х) = Lif x <3, s(1;x) = 14 if x = 4,5, 5(1;6) = 0. " SS 
pn Exercise 7, suppose that if the person insures the article, à legis wa 
i surance on his door which reduces the probability that the article x ie 7 
(9 * Under these circumstances, answer the same questions as In оке е 
5 Modify Exercise 7 by making it possible to insure the article for any : mire 
u Cen 0 and $2,000. Let D denote the amount for which the ru is d e à 
t Ppose the cost of insuring the article for D dollars is equal to 00 Жы. со 
unction W(y;D:x)? Compute r(0;s) for the decision rule s which se q 
x16). ` 


i ; " 

es Suppose Exercise 3 of Sec. 5.6 is modified so that 0 is known to be either 0. 
апа D can be no greater than 8. Graph the convex set C. "— 
or g, "pose Exercise ог Sec. 5.6 is modified so that 0 is known to be either 0.2 


С: 
D can be no greater than 7. Graph the convex set. M Ине 
3% op у РРО5е that Exercise 6 of Sec. 5.6 is modified so that 0 is known to be ei 


/8 Graph the convex set C. " ither L4 
3 А x known to be either 
or 3; "РРозе Exercise 7 of Sec. 5.6 is modified so that 0 is kn 74 


24: Graph th С. | 
Й e сз е жу i i to be either 0.1 
ого, "Ippose Exercise 8 of Sec. 5.6 is modified so that 0 is known 


Graph the convex set C. 
Section 5.8 


deci From the diagram of Exercise 1 of Sec. 5.7, find a point representing a Bayes 
“оп rule relative to ҚОЛ) = 0.3, (0.4) = 0.7. 
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" А А 
2. From the diagram of Exercise 2 of Sec. 5.7, find a point representing a Baye: 
decision rule relative to (0.2) = 0.5, b(0.3) = 0.5. 


ы ; ayes 
3. From the diagram of Exercise 3 of Sec. 5.7, find a point representing a Baye: 
decision rule relative to (34) = 0.4, Ы) = 0.6. | А és 
4. From the diagram of Exercise 4 of Sec. 5.7, find a point representing a Bay 
decision rule relative to b(14) = 0.8, b(34) = 0.2. 
5. From the diagram of Exercise 5 of Sec. 5.7, find a 
decision rule relative to 5(0.1) = 1, (0.8) = 0. 


Section 5.9 


point representing a Bayes 


1. For Exercise 1 of Sec. 5.7, construct a Bayes decision rule s relative to 00.1) 
= 0.3, 000.4) = 0.7. Compute r(0.1 ;s), r(0.4;s). 
2. For Exercise 2 of Sec. 5.7, consi 
= 0.5, b(0.3) = 0.5. Compute "(0.2;5), "(0.3 ;5), 
3. For Exercise 3 of Sec. 5.7, cons 
= 0.4, 007%) = 0.6. Compute r(34: 
4. For Exercise 4 of Sec. 5.7, c 
= 0.8, (34) = 0.2. 
5. For Exercise 5 of Sec. 5.7, 
= 1, 00.8) = 0. Compute r(0.1;5), r(0.8;5). 
6. Modify Exercise 9 of Sec. 
1,000, 2.000 and the only possib 
decision rule s relative to b(14) - 


Section 5.10 


А rs 
1. In Exercise 1 of Sec, 5.6, Suppose the only possible values of D are the intege 
from 0 to 10. Find a Bayes deci 


с Sion rule s relative to the a priori distribution 
with pdf b(0) = e-0 for 0 >. 0, 

2. In Exercise 2 of Sec. 5.6, find 
distribution В(0) with Pdf 6(0) = 0.01е- 0.010 for 0 >. 0, ; ers 
3. In Exercise 3 of Sec. 5.6, Suppose the only possible values for D are the inte? 
from 0109. Finda Bayes decision rule 5 relative to the a priori distribution 

with pdf (0) — | for 0 <0 = | 


= К the 
_ 4. In Exercise 4 of Sec. 5.6, Suppose the Only possible values for D Ar etie 
Integers from 0 to 9. Find a Bayes decision rule s relative to the a prior! 

bution B(0) with pdf b(0) — 20 Гого = 9 = 1 


: In Exercise 6 of Sea. 5 
priori distribution BO) - 
In Exercise 7 of Sec 


Ее a priori 
а Bayes decision rule s relative to the a [ 


А | the 2 
ayes decision rule s relative (0 
Compute r(0 5s). 


Compute r(0;s). 


" К {һе а 
wei TI ERI 5 a Ba isi relative to 
priori distribution BO) - for то ^ ауе$ decision rule 


00, 
ы n Modify Exercise 9 of Sec. 5.6 so that the only possible values of D are 0 Den 
200. Construct а Bayes decision rule у relative to the a priori distribu 

(0) = @ for 0 Ek Compute (0 5) 
Section 5.11 


pou 


Aw? 
n 
OG 
5 
* 
E 
2 
e 
Р 
БЧ 
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be demanded. Y has a normal distribution, with mean 0, and standard deviation 
9. The problem is to decide how much to stock. D can be any nonnegative 
number. The net profit rate is $5 per unit sold, —2 per unit unsold. Before a 
decision 4s chosen, Yj, Xa, Y, will be observed, where these are independent, 
independent of Y, and each has the same distribution as Y. Find a Bayes decision 
rule relative to the a priori distribution (0,05) with 


0.01 1 Р 
df 5(0,,0,) — ——— exp} == (0, — 10,000)? — 0.010, for 0. > 0 
pdf 2(0,,0,) TUE zf ТОО? | 1 ) | 


b(0,,0,) 0 for 0, < 0 
3. For Exercise 9 of Sec. 5.6, construct a Bayes decision rule relative to the a 
Priori distribution B(0) 0 for O - 0 - 1. | 
6. For Exercise 9 of Sec. 5.6, construct a Bayes decision rule s relative to the a 
ге distribution B(0) which has jumps equal to 14 at 0 = 0, 15, 1. Compute 
rUs). 
Section 5,12 


d: For Exercise 1 of Sec. 5.7, find (2.3), 850-2, 3) ~ 2,4), 990.0. (7 3,7). 
^ For Exercise 3 of Sec. 5.7, find g,(1.1), (0, — 1), 8203.3), gal — 1,3). 
Section 5.13 


1. For Exercise 1 of Sec. 5.6, find a function of Ху, Xa, Ху which is sufficient. 


For Exercise 2 of Sec. 5.6, find a function of Ху, Xa which is sufficient. 

3. In Exercise 6 of Sec. 5.6, X was defined as the number of test animals that 
Scame Grade A out of 5 test animals observed. Моге detailed information could 
S given by numbering the test animals and defining X; as equal to 1 if the ith test 
animal becomes Grade A and as equal to 0 otherwise. Show that knowing only x 

'S às good as knowing X1... ., Ху. h 
: In Exercise 7 of Sec. 5.6, Y was defined as the number of observed articles 
hat Were stolen. More detailed information could be given by numbering the 
Served articles and defining X, as equal to 1 if the ith observed article is stolen 


and as equal to zero otherwise. Show that knowing only X is as good as knowing 
b. e 


- 8 
Section 5.14 


1, For Exercise | of Sec. 5.7, find a minimax decision rule. 
3 OF Exercise 2 of Sec. 5.7, find a minimax decision rule. 


is à person feels that he knows which particular value of 0 is the true one, what’ 


e a i кашек айй М P apti n ne i 
relati, appropriate a priori distribution to use? How is a Bayes yg i ru ч 
of pre to this a priori distribution constructed? In particular, what is the ro 

1, 9 


4. Find à minimax decision rule for each of Exercises 3, 4, 5 of Sec. 5.7. 
“(оп 6.1 


1 " Š aracteristi B 
E Suppose there are two fuel types, with the following characteristics: 
Fuel type 


I 2 
Weight per unit volume 15 10 
Energy per unit volume 30 25 


Cost per unit volume 1 0.8 
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ai veighing no more 

Suppose a mixture of at least 8 units of volume must be aor al dicti 

cha 120 units and containing at least 260 units of energy: B n b бийл: 

res ectively the number of units of volume of fuel type ani siste ж, 

бы In the X1, xa plane, sketch the set G of all ара я ЕР G with the 

" rey icti is di indicate a 

ight, and ener restrictions, On this diagram, indi ! : 
he cd U. Ea point of G which minimizes the cost of the mixture. 


Section 6,2 


1. Suppose there are four types of food, with the following properties: 


Amt. of vitamin A/unit vol. 
Amt. of vitamin Bļunit vol. 
Amt. of vitamin C/unit vol, 
Amt. of vitamin Dlunit vol. 


Cost per unit volume 


: in D. 
A Н vitamin £ 
, at least 25 units of vitamin C, and at least 18 units of ts at mini- 
i Pes of food achieving these requiremen 


arge 
E itrarily larg 
show that the convex Set G has points with arbitrarily 
coordinates, Why does thi 


ion? What 
is fail to cause апу trouble in the computation? 
would happen if we tried to maximize the cost? 


Section 6.3 


mand for the product has ¢ is 0.5, 
andard deviation 200; if the proportion d 


However, at the start of 
.20 apiece, or 16,000 units at 

itional units, Whenever ne 
Ons: (1) buy 8,000 units 
Before choosing one o 


ecia 
ve how many are of the sp 
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1 i less. 
à " aking unsold units valuel 14,34. 1. 
3 Mod cem gr aiy allowing the а values of 0 to be 5, 34 

- Modify E» 2. а й 

4 dify Exercise 7 of Sec. 5. ya K len, : 
ч. оу minimax decision = е noe istic tons of 0 to be 14, %, 
5. Modify Exercise 8 of Sec. 5.6 by allow и lem. 
7: Ril nine decision гше for the resulting prob 


Section 7.2 


i $50 to 
i xt when it costs 
l. Find à minimax decision rule for the example in the te 
| а аз 
і 5150 
Observe a consumer. i le for the example in the text when it costs 
Find a minimax decision rule ^ 
to observe д consumer. 


Section 7,5 


i e ase a(G,3d,) = 0, 
l. Findana roximately minimax Wald «ioter apad oo “Ty ki SEL 
Cid) = 300,000, 4(Gs:d;) = 200,000, a(G5:d;) = 0, i 


Section 7.6 


i be 0.1, 
Г i hat 0 is known to 
l. If the illustrative example in the text is а 00: what ‘decision mil 
What decision rule should be used? If 0 is known » 
i ther 
be used? le in the text is modified so that 0 is known to be ei 
. Ifthe illustrative example in | 
А "n “1 : Е 
"For qund а minimax ка ied find a Bayes decision rule relative to th 
For the illustrative example in the text, 
lori diste «0 «l1. ja abc 
а priori distribution B(0) = 0 for 0 < 0 7 ies of a monthly magazine at only 
; Suppose that a newgdealer can obtain and] he middle of the month. At 
two timer, at the beginning of the month mean t A 15 cents регсору. Hecan 
Sither time, he can obtain as many copies S ewan d ; times: at the middle of the 
return an ‘number of copies to the distributor at two Е at: testi een ҮКЕ 
td or at the end of the month. For each my ‹ f the month he returns all 
™onth he receives 10 cents per copy, and at the ат price is 30 cents per copy. 
the iier Lud Resim S.cenfs a den к from the dealer during E EE 
е agazines that will be dema ring the second half 
s ofthe onth ар the number ees ge ста і distribution with 
the mo th ask dep dent chance variables, each э ig aoe many magazines 
the Same mk To игн 0. The dealer must oir order e sr the 
unknown pz i di equis 1 goal the 
to order inning of the month ап fy Ken Kes Жугер 
at the beginning o mc à AX, Xa Xs, X, 
middle of the n Before ed aS een dU Wd ame c 
Are inde nce variables, each wit iori distribution B(0) = 1 — е—0. 
inda pee dede rule relative to the a priori dist 
for 0 20 


0.1 


1 to 100, what decision 
5. Ir Exercise 4 is modified so that 0 is known to be equa 


"ds ? ion process is a three-stage 
1 E TAA le in the text so that the adus om the third stage. It 

Process End only [texts ourdiig the first two stages ca 

Costs 


i Find a 
: of production. 
1 h the third stage trieta 
a tem started throug i distribution B(O) = 0" for 0 < 
ayes ee istis to the a priori шш vale: EGS not necessary 
i f the Wald seque iscussed in Secs. 7.3 and 7. 
i oie E elements in the problem discu: 


îlloweq us to “work forward” ? 


RY 
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Section 8.2 


imit 
central limi 
1. If Z,, has a binomial distribution with parameters т, p, rud standard 
theorem tells us that the cdf for (Z,, — mp) V тр(1 — p) eim — p| < 9 for 
normal cdf as т increases. Use this fact to approximate 1 e tis probability 
large m. Compare the approximation to the upper boun N 
e ; at fo 
iven by Tchebycheff’s inequality. 3 " rove that 
UE рс iis à chance variable with mean 4 and variance В. Р 
any positive constant с, P(X — 4 & c) >1 — Bie, 
Section 8.3 | nt Btnot АЙ = 
1. If Ay, А»... А are mutually exclusive by pairs, show that P( 
P(not 45) + ۰< 4 P(not А„) > 1. 
Section 8.4 
1. Write out a complete proof for Theorem 2 
necessarily continuous. 


2. Suppose Х,..., X, are inde 
Define Z as max АО, ous wy aX 


js not 
fo: F(x) is n 
г the case where F(x) ! 


В df F(x). 
pendent, each with the d suy My 
т) — Р(х). Define У, аѕ Р(Х) fori = l, 
Ун | : W. 
and define Was max Ау; Yayo... Y.) = J^ ShowthatZ ibution of 

Оу А istribu 
3. Use Exercise 2 to show that if F(x) is continuous, the distr 
MAX HH: Ж... ‚ Xn) — F(x)! does not depend on F(x). 
т 


Section 8.5 


1. Suppose the news vendor's proble dee 
Profits above zero, Where the tax on a net profit of p is equal to p oblem. E 
given positive Constant. Find the empirical decision rule for this рг 


| nt and 4! 
5 problem, Suppose X;,..., Y, are mop 1 RT 
Probability distribution as а + t fori = 2,..., те empirica 
known positive constant. What is a reasonable modification of t 
decision rule? 
3. In Exercise 2 


^ Suppose / is an 
modification of thee 


Section 9.1 


on all net 


в 5 ; 3 a tax 2 д 
m is modified by imposing a t tp), Та 


asonable 
at is a re 
unknown constant, What is û 
mpirical decision Tule? 


1. What is the pro 

5.13 for conventional 
2. Find W(9:p 

Section 9.2 


ig Sec 
ч : iven in 
Per modification of the definition of sufficiency £ 

Statistical Problems? 


5х) for each exercise of Sec. 5.6. 


df 
oint Р А 
2. Suppose Broup I contains а Single continuous distribution, on e ro 
f Gil), an &roup П contains а Single continuous distribution, with join for on) 
Show that any admissible decision rule has the form: Choose decision (xs DIS? 
for Which f(x:2)/¢(:1) = ©} choose decision 2 for each x for which fC 
> €, where ¢ д constant. 
Section 9,3 
1. Finda 


Xa arê 
А minimax test of leve] 
independent, €ach with 


Ау,“ 
е 
of significance 0.10 for the case wher 
and 4 = 2,B =4 


sr 
arameter 
an exponential distribution with unknown р 


EXERCISES 183 


x Find a minimax decision rule of level of significance 0.2 for the case where 
D А, X, are independent, each with the probability distribution 


! Possible values 0 1 


Probabilit 1-0 0 
and 4 — M, B = 3. у 


Section 9.4 


"7 Find a minimax test of level of significance 0.05, for the case where X, X, are 
Кы, each with an exponential distribution with unknown parameter 0 and 
=, By 1, В, = 4. 

2. E oie "—— 
inde Find a minimax test of level of. significance 0.10, for the case where X;, X, are 
d ependent, each with a normal distribution with mean zero and unknown standard 
Eus 0, and А = 6, B, = 2, B, = 9. 
X. Find a minimax test of level of significance 0.15 for the case where Xj, Xs, Xa, 
E independent, each with the distribution described in Exercise 2 of Sec. 9.3, 
4 = Ya, By = M, В, = 34. 


Section 9,5 


S IP A ае X, are independent, each with the distribution described in 
"m Mese 2 of Sec. 9.3, find the estimate of 0 given by the Bayes decision rule relative 
= 4 a priori distribution B(0) = 0 Гого < 0 -< 1, where W(0;D) = (D — 0). 


ne C CNN. Y,, are independent, each with pdf /(x;0) = (1/0)e~*!" for x > 0, 

‘to th ) — O for x < 0, find the estimate of 0 given by the Bayes decision rule relative 
€ cdf B(0) = | — e-9.99 for 0 0, where W(0;D) = (D — 0)2/02. 

Section 9,6 


1. 


dis UDpose Ху, Xa... are independent chance variables, each with a normal 
i 


nen with variance equal to 1 and unknown mean 0. The problem is to 

number, à point estimate of 0, апа this estimate can be based on any predetermined 

estimat m of the chance variables X;, Xa, uc If m variables are used and the 

ог this 15 D, the loss is Sm + 100( D — 0Y. Describe a minimax decision rule 

made problem. The rule must specify the value of m and how the decision is 
Опсе X,.,,, Y, are observed. 


Section 9.7 
1 
> If Жаза. > Х are all independent, each with pdf f(x:0) -e-0-0for x > 6, 


1 
Ss ч X5... Хы are all independent, each with pdf f(x; = (1/0)e - &/9) for 
650) = 0 for x < 0, find the best invariant estimate of 0. 


А аль ‚++, Xm are all independent, each with a uniform distribution between 
2 у? where 0, < 0,, find the maximum-likelihood estimates for 01, fs. 
залда бта Xm are all independent, each with a normal distribution with known 
zn deviation c. E{X;} = 0, + бы, where ty, . . . , Г are given known values, 
that P 0» are unknown. Find the maximum-likelihood estimates for 0,, 0. Show 
ese estimates have a joint normal distribution. 
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3. In Exercise 2, suppose 9, 0», с are all unknown. Find the maximum-likeli- 
hood estimators for these three parameters. 


Section 9.11 


1. Suppose X;,..., Xm are all independent, each with a normal distribution 
with unknown mean 0, and unknown standard deviation 0. Group I consists of 
all distributions with 0; = A, where A is a given value. Find the likelihood-ratio 
rule for testing this hypothesis. Show that a table of the г distribution can be used 
to help specify the rule. 


2. Suppose X1,..., X, are all independent, each with a uniform distribution 
between 0, and 0», where 0, < 0,. Group I consists of all distributions with 
0, — 0, < 2. 


Find the likelihood-ratio rule of level of significance 0.1 for testing 
this hypothesis. 


APPENDIX 


APPENDIX 187 


^ TABLE 1 


This table gives the value of (1/ V zl e - G3 dy for the values of z listed. The 


value of z for any cell is given by adding the number at the extreme left of the row 
in which the cell appears to the number at the extreme top of the column in which 
the cell appears. The value of the integral for a negative z can be found by sub- 
tracting its value for = from 1. 


z 0.00 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 
0.0 |о.5000 | 0.5040 | 0.5080 | 0.5120 | 0.5160 | 0.5199 | 0. 0.5319 | 0.5359 
0.1 | 0.5398 | 0.5438 | 0.5478 | 0.5517 | 0.5557 | 0.5596 | 0.5636 | 0.5675 | 0.5714 | 0.5753 
0.2 | 0.5793 | 0.5832 | 0.5871 | 0.5910 | 0.5948 | 0.5987 | 0.6026 | 0.6064 | 0.6103 | 0.6141 
0.3 10.6179 | 0.6217 | 0.6255 | 0.6293 | 0.6331 | 0.6368 | 0.6406 | 0.6443 | 0.6480 | 0.6517 
0.4 — | 0.6554 | 0.6591 | 0.6628 | 0.6664 | 0.6700 | 0.6736 | 0.6772 | 0.6808 | 0.6844 | 0.6879 
0.5 |06915 | 0.6950 | 0.6985 | 0.7019 | 0.7054 | 0.7088 | 0.7123 | 0.7157 | 0.7190 | 0.7224 
0.6 | 0.7257 | 0.7291 | 0.7324 | 0.7357 | 0.7389 | 0.7422 | 0.7454 | 0.7486 | 0.7517 | 0.7549 
0.7 | 0.7580 | 0.7611 | 0.7642 | 0.7673 | 0.7704 | 0.7734 | 0.7764 | 0.7794 | 0.7823 | 0.7852 
0.8 | 0.7881 | 0.7910 | 0.7939 | 0.7967 | 0.7995 | 0.8023 | 0.8051 | 0.8078 | 0.8106 | 0.8133 
0.9 | 0.8159 | 0.8186 | 0.8212 | 0.8238 | 0.8264 | 0.8289 | 0.8315 | 0.8340 | 0.8365 | 0.8389 
1.0 | 0.8413 | 0,8438 | 0.8461 | 0.8485 | 0.8508 | 0.8531 | 0.8554 | 0.8577 | 0.8599 | 0.8621 
L1 | 0.8643 | 0.8665 | 0,8686 | 0.8708 | 0.8729 | 0.8749 | 0.8770 | 0.8790 | 0.8810 | 0.8830 
12 | 0.8849 | 0,8869 | 0.8888 | 0.8907 | 0.8925 | 0.8944 | 0.8962 | 0.8980 | 0.8997 | 0.9015 
L3 | 0.9032 | 0.9049 | 0.9066 | 0.9082 | 0.9099 | 0.9115 | 0.9131 | 0.9147 | 0.9162 | 0.9177 
1.4  |0:9192 | 0.9207 | 0.9222 | 0.9236 | 0.9251 | 0.9265 | 0.9279 | 0.9292 | 0.9306 | 0.9319 
LS [0,9332 | 0.9345 | 0.9357 | 0.9370 | 0.9382 | 0.9394 | 0.9406 | 0.9418 | 0.9429 | 0.9441 
1.6 | 0.9452 | 0.9463 | 0.9474 | 0.9484 | 0.9495 | 0.9505 | 0.9515 | 0.9525 | 0.9535 | 0.9545 
1.7  |0:9554 | 0.9564 | 0.9573 | 0.9582 | 0.9591 | 0.9599 | 0.9608 | 0.9616 | 0.9625 | 0.9633 
L8 |0:9041 | 0.9649 | 0.9656 | 0.9664 | 0.9671 | 0.9678 | 0.9686 | 0.9693 | 0.9699 | 0.9706 
L9 | 0.9713 | 0.9719 | 0.9726 | 0.9732 | 0.9738 | 0.9744 | 0.9750 | 0.9756 | 0.9761 | 0.9767 
2.0 | 0.9772 | 0.9778 | 0.9783 | 0.9788 | 0.9793 | 0.9798 | 0.9803 | 0.9808 | 0.9812 | 0.9817 
2.1 | 0.9821 | 0.9826 | 0.9830 | 0.9834 | 0.9838 | 0.9842 | 0.9846 | 0.9850 | 0.9854 | 0.9857 
22 | 0.9861 | 0.9864 | 0.9868 | 0.9871 | 0.9875 | 0.9878 | 0.9881 0.9884 | 0.9887 | 0.9890 
2:3 | 0.9893 | 0.9896 | 0.9898 | 0.9901 | 0.9904 | 0.9906 | 0.9909 | 0.9911 | 0.9913 | 0.9916 
2.4 | 0.9918 | 0.9920 | 0.9922 | 0.9925 | 0.9927 | 0.9929 | 0.9931 | 0.9932 | 0.9934 | 0.9936 
2.5 | 0.9938 | 0.9940 | 0.9941 | 0.9943 | 0.9945 | 0.9946 | 0.9948 | 0.9949 | 0.9951 | 0.9952 
2.6 [0.9953 | 0.9955 | 0.9956 | 0.9957 | 0.9959 | 0.9960 | 0.9961 | 0.9962 | 0.9963 | 0.9964 
2.7 | 0.9965 | 0.9966 | 0.9967 | 0.9968 | 0.9969 | 0.9970 | 0.9971 | 0.9972 | 0.9973 | 0.9974 
2.8 | 0.9974 | 0.9975 | 0.9976 | 0.9977 | 0.9977 | 0.9978 | 0.9979 | 0.9979 | 0.9980 | 0.9981 
2.9 0.9981 | 0.9982 | 0.9982 | 0.9983 | 0.9984 | 0.9984 | 0.9985 | 0.9985 | 0.9986 | 0.9986 
3.0 [0,9987 | 0.9987 | 0.9987 | 0.9988 | 0.9988 | 0.9989 | 0.9989 | 0.9989 | 0.9990 | 0.9990 
3.1 | 0.9990 | 0.9991 | 0.9991 | 0.9991 | 0.9992 | 0.9992 | 0.9992 | 0.9992 | 0.9993 | 0.9993 
3.2 | 0.9993 | 0.9993 | 0.9994 | 0.9994 | 0.9994 | 0.9994 | 0.9994 | 0.9995 | 0.9995 | 0.9995 
3.3 | 0.9995 | 0.9995 | 0.9995 | 0.9996 | 0.9996 | 0.9996 | 0.9996 | 0.9996 | 0.9996 | 0.9997 
3.4 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9997 | 0.9998 
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APPENDIX 


TABLE 3 


ao T has a t distribution with л degrees of freedom, this table gives the val 
т which P(—1 < T < f) = A, for the values of л listed in 


of the table and the values of A listed in the row at the top of the table. 


189 


ue of t 


the column at the left 


Value т 

ef Value of A 

" 0.99 0.98 0.95 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 
———L 

H 1.963 | 1.376 | 1.000 | 0.727 0.510 | 0.325 | 0.158 
3 1.386 | 1.061 | 0.816 | 0.617 0.445 | 0.289 | 0.142 
4 1.250 | 0.978 | 0.765 | 0.584 0.424 | 0.277 | 0.137 
Ж 1190 | 0.941 | 0.741 | 0.569 0.414 | 0.271 | 0.134 
1.156 | 0.920 | 0.727 | 0.559 0.408 | 0.267 | 0.132 
б 1.134 | 0.906 | 0.718 | 0.553 0.404 | 0.265 | 0.131 
8 1.119 | 0.896 | 0.711 0.549 | 0.402 | 0.263 | 0.130 
9 1.108 | 0.889 | 0.706 | 0.546 0.399 | 0.262 | 0.130 
10 1.100 | 0.883 | 0.703 | 0.543 0.398 | 0.261 | 0.129 
1.093 | 0.879 | 0.700 | 0.542 0.397 | 0.260 | 0.129 
n 1.088 | 0.876 | 0.697 | 0.540 0.396 | 0.260 | 0.129 
13 1.083 | 0.873 | 0.695 | 0.539 0.395 | 0.259 | 0.128 
14 1.079 | 0.870 | 0.694 | 0538 0.394 | 0.259 | 0.128 
15 1:076 | 0.868 | 0.692 | 0.537 0.393 | 0.258 | 0.128 
1.074 | 0.866 | 0.691 0.536 | 0.393 | 0.258 | 0.128 
19 0,690 | 0.535 | 0.392 | 0.258 0.128 
18 0.689 | 0.534 | 0.392 0.257 | 0.128 
19 0.688 | 0.534 | 0.392 0.257 | 0.127 
20 0.688 | 0.533 | 0.391 0.257 | 0.127 
0.687 | 0.533 | 0.391 | 0.257 0.127 
2 0.686 | 0.532 | 0.391 | 0.257 0.127 
23 0.686 | 0.532 | 0.390 0.256 | 0.127 
24 0.685 | 0.532 | 0.390 0.256 | 0.127 
25 0.685 | 0.531 | 0.390 0.256 | 0.127 
0.684 | 0,531 | 0.390 0.256 | 0.127 
5 0.684 | 0.531 | 0.390 0.256 | 0.127 
28 0.684 | 0.531 | 0.389 0.256 | 0.127 
29 0.683 | 0.530 | 0.389 | 0.256 0.127 
30 0.683 | 0.530 | 0.389 0.256 | 0.127 
0.683 | 0.530 | 0.389 0.256 | 0.127 
= 0.674 | 0.524 | 0.385 0.253 | 0.126 


Source: Reprinted from R. A. Fisher, 


“Statistical Methods for Research 


orkers,” Table IV, Oliver & Boyd Ltd., Edinburgh, by permission of the author 


and publishers. 
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Convergence, stochastic, 129-131, 133 
Convex set, 61, 72, 74 
Correlation coefficient, 55 
Covariance, 54 
Cumulative distribution function, 14, 24, 
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conditional, 22 

empirical, 129, 131-135 

general properties of, 26 
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Exponential distribution, 60 


F distribution, 53, 54, 163 
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Fundamental Occurrence, 5, 6 


Hypergeometric distribution, 44, 45 
Hypothesis, accepting, 138 
rejecting, 138 
testing, 137-146 
by likelihood-ratio method, 161-163 
One-sided, 140-144 
two-sided, 144-146 
Inadmissible decision rule, 71, 75 
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Interval estimation, 156-158 
Invariance condition, 149, 153 
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Likelihood ratio method, 161-163 
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Maximum likelihood method, 1 
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Minimax test, 138 
construction of, 138-146 
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bivariate, 58, 59 
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Notation for decision problems. 


Objective function, 92 
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Pdf (see Probability density func 
Point estimation, 146-156 
Poisson distribution, 43, 44 
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Probability, axiomatic development of, 
5 

basicyrules of, 2-5, 7 
conditional, 3 
intuitive definition of, 1 
subjective, 89 

Probability density function, 28, 29 
conditional, 37, 40 
joint, 33, 34, 39 
marginal, 37-39 

Probability distribution, 11, 12, 24 
conditional, 21-23 
joint, 16, 17 
marginal, 19, 20, 22, 23 


Random variable, 11 
Randomization in decision rules, 67, 
83-86 


Regular decision, 124 
Risk, 68 


Sample mean, 88 

Sample variance, 88 

Sampling decision, 108-124 

Scale parameter, estimation of, 153-156 
Sequence of decisions, 108-128 
Sequential sampling, 111-124 
Significance, level of, 138 

Simplex method, 95-106 

Slack variable, 92 

Standard deviation, 41 
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Standard normal distribution, 46, 50, 51 

Statistical problems, 12, 64-66 
standard formulation of, 136, 137 

Stochastic convergence, 129-131, 133 

Stochastic variable, 11 

Subjective probability, 89 

Sufficiency, 86-88 

Supporting hyperplane theorem, 61-64 


1 distribution, 52, 53, 57 
Tableau, 97-106 
starting, 101 
Tchebycheff’s inequality, 129, 130 
Testing a hypothesis (see Hypothesis) 
Transformation of chance variables, 
30-33, 35-37, 39, 40 


Uncorrelated chance variables, 55 
Uniform distribution, 45, 46 


Variance, 41 
analysis of, 163 
sample, 88 

Variate, 11 


Wald sequential rule, 118 
minimax, 118-124 

Well-balanced spinner, 13, 25, 26 

Well-shuffled deck, 8 


; " A m - p 

— GÀ ET x M i К, 

б А 
EE. c: И : D 


bus O9 E. 4 


4 
ғ 


ДЕ 


Form No. 3. 
PSY, RES.L-1 


Bureau of Educational & Psychological 
Research Library. 

ee eee 

The book is to be returned within 


the date stamped last. 


WBGP-59/60-51190-5M 


а „та аб 


FormNo4  — - 
D Г BOOK CARD zs 
COU NOE Acon. Nó...... , ^ M 


т ыр ы 
Titla. seese АЛДА 


